Distributed storage system and data processing method

ABSTRACT

This application provides a storage device, a distributed storage system, and a data processing method, and belongs to the field of storage technologies. In this application, an AI apparatus is disposed inside a storage device, so that the storage device has an AI computing capability. In addition, the storage device further includes a processor and a hard disk, and therefore further has a service data storage capability. Therefore, convergence of storage and AI computing power is implemented. An AI parameter and service data are transmitted inside the storage device through a high-speed interconnect network without a need of being forwarded through an external network. Therefore, a path for transmitting the service data and the AI parameter is greatly shortened, and the service data can be loaded nearby, thereby accelerating loading.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. patentapplication Ser. No. 17/677,864, filed on May 7, 2020, which is acontinuation of International Patent Application No. PCT/CN2020/088871,filed on May 7, 2020, which claims priority to Chinese PatentApplication No. 201910779723.9, filed on Aug. 22, 2019 and ChinesePatent Application No. 201911109237.2, filed on Nov. 13, 2019. All ofthe aforementioned patent applications are hereby incorporated byreference in their entireties.

TECHNICAL FIELD

This application relates to the field of storage technologies, andespecially, to a storage device, a distributed storage system, and adata processing method.

BACKGROUND

With development of storage technologies and artificial intelligence(artificial intelligence, AI) technologies, a storage device may storeservice data for AI computing. The service data may be, for example, asample set used for model training, such as a face image set used fortraining a facial recognition model, an audio sample set used fortraining a speech recognition model, or a sample text used for traininga semantic recognition model.

The face image set is used as an example. A single server cannot meet acomputing requirement due to a large amount of to-be-computed data.Therefore, currently, a mainstream architecture is a clusterarchitecture with a plurality of devices. This architecture includes anAI cluster, a storage cluster, and a switch. The AI cluster includes aplurality of AI servers for AI computing. The storage cluster includes aplurality of storage devices for storing service data. The switch isconfigured to forward service data between an AI server and a storagedevice. In an AI computing process, the AI server establishes a remotenetwork connection to the storage device according to a transmissioncontrol protocol/internet protocol (transmission controlprotocol/internet protocol, TCP/IP) protocol. When the AI server needsto obtain service data required for AI computing, the AI server sends adata obtaining request to the storage device through the switch. Afterreceiving the data obtaining request, the storage device sends storedservice data to the AI server through the switch. Then, the AI serverloads the service data to a memory for AI computing.

In the foregoing architecture, the service data needs to be sent fromthe storage device to the switch through a network, and then sent fromthe switch to the AI server through the network, so that the AI servercan obtain the service data to perform AI computing. Therefore, a pathfor obtaining service data in an AI computing process is very long.Consequently, the service data is obtained at a very slow speed and withlow efficiency.

SUMMARY

Embodiments of this application provide a storage device, a distributedstorage system, and a data processing method, to resolve a technicalproblem that service data is obtained at a very slow speed and with lowefficiency in a related technology. The technical solutions are asfollows:

According to a first aspect, a storage device is provided, including aprocessor, a hard disk, and an AI apparatus. The AI apparatuscommunicates with the processor through a high-speed interconnectnetwork. The processor is configured to: receive service data, and storethe service data in the hard disk. The AI apparatus is configured to:send a data obtaining request to the processor to obtain the servicedata, and perform AI computing on the service data.

The AI apparatus is disposed inside the storage device provided in thisembodiment, so that the storage device can provide an AI computingcapability through the AI apparatus and provide a service data storagecapability through the processor and the hard disk in the storagedevice, thereby implementing convergence of storage and AI computingpower. When AI computing needs to be performed, the service data istransmitted inside the storage device through the high-speedinterconnect network without a need of being forwarded through anexternal network. Therefore, a path for transmitting the service data isgreatly shortened, and the service data can be loaded nearby, therebyaccelerating loading. Optionally, the data obtaining request includes afirst data obtaining request. The processor is configured to: inresponse to the first data obtaining request, obtain the service datafrom the hard disk, and send the service data to the AI apparatus. Inthis manner, the AI apparatus obtains the service data nearby. Thestorage device includes the AI apparatus, the processor, and the harddisk. Therefore, when the AI apparatus needs to obtain the service data,the AI apparatus sends the data obtaining request to the processor. Theprocessor in the storage device obtains the service data from the harddisk, and sends the service data to an AI processor, so that the AIprocessor can locally obtain the service data. This avoids communicationoverheads caused by requesting the service data from a remote storagedevice through a network, and shortens a delay of obtaining the servicedata.

Optionally, the data obtaining request includes a second data obtainingrequest. The processor is configured to send metadata of the servicedata to the AI apparatus in response to the second data obtainingrequest. The metadata is used to indicate an address of the servicedata. The AI apparatus is configured to: when the metadata indicatesthat the service data is located in the storage device, send a firstdata access request to the hard disk. The first data access requestincludes the metadata. The hard disk is configured to: obtain theservice data based on the metadata, and write the service data into theAI apparatus through DMA. In this manner, DMA pass-through between theAI apparatus and the hard disk can be implemented. A DMA path isestablished between the AI apparatus and the hard disk, so that the AIapparatus and the hard disk can quickly exchange the service data witheach other through the DMA path. This accelerates service data loadingby the AI apparatus, increases an amount of service data that can besimultaneously processed by the AI apparatus, reduces communicationoverheads for transmitting an AI parameter between AI apparatuses, andaccelerates AI training.

Optionally, the data obtaining request includes a third data obtainingrequest. The processor is configured to send metadata of the servicedata to the AI apparatus in response to the third data obtainingrequest. The metadata is used to indicate an address of the servicedata. The AI apparatus is configured to: when the metadata indicatesthat the service data is located in another storage device, send asecond data access request to the another storage device. The seconddata access request includes the metadata. In this optional manner, RDMApass-through between an AI memory in the AI apparatus and the anotherstorage device is implemented, and the AI memory and the another storagedevice quickly exchange the service data with each other. Thisaccelerates AI training.

Optionally, the storage device further includes a memory. The processoris further configured to obtain a segment of memory space from thememory through division and reserve the segment of memory space for theAI apparatus. In this optional manner, the AI apparatus can borrow thememory in a storage apparatus to perform AI computing, so that availablememory space of the AI apparatus is expanded, and the AI apparatus canperform AI computing in larger memory. This improves AI computingefficiency.

Optionally, the AI apparatus includes an AI processor and an AI memory.The AI processor is configured to: when an available capacity of the AImemory reaches a preset threshold, send a memory application request tothe processor. The available capacity of the AI memory is determined bya specified batch size. The memory application request is used torequest the processor to obtain a segment of memory space from thememory through division and reserve the segment of memory space for theAI apparatus. In this optional manner, the AI processor may performtraining by using memory space of the memory. Therefore, because largeravailable memory space can increase a batch size for AI training, anamount of service data that can be processed by the AI apparatus in onebatch can be increased, communication overheads for exchanging an AIparameter between different AI apparatuses can be reduced, and AItraining can be accelerated. Experiments show that, if AI training isperformed only through the AI memory, a maximum batch size is set to256. However, in this manner, the batch size may be set to 32000.Therefore, the batch size is significantly increased.

In a related technology, the memory in the storage device has a fixedcapacity, and consequently there is frequently no sufficient memory forstoring the service data. However, in this optional manner, the storageapparatus can borrow the AI memory in the AI apparatus to read/write theservice data, so that available memory space of the storage apparatus isexpanded, and the storage apparatus can store the service data in largermemory. This shortens a service data read/write time, and improvesservice data read/write efficiency.

Optionally, the AI apparatus includes an AI processor. The AI processoris configured to: divide a computing task into at least two subtasks,and send a first subtask in the at least two subtasks to the processor.The processor is further configured to: execute the first subtask, andsend a computing result to the AI processor. In this optional manner,computing power of the AI processor and computing power of the processorare collaborated, and the AI processor can borrow the computing power ofthe processor in the storage device to increase the computing power ofthe AI processor. This accelerates AI computing processed by the AIprocessor.

Optionally, the AI processor is further configured to: before dividingthe computing task into the at least two subtasks, determine that thecomputing power of the AI processor is insufficient. In this optionalmanner, when determining that the computing power of the AI processor isinsufficient, the AI processor can borrow the computing power of theprocessor to process AI computing. This breaks a bottleneck ofinsufficient computing power resources in an AI training process.

Optionally, the processor is configured to: divide a computing task intoat least two subtasks, and send a second subtask in the at least twosubtasks to the AI processor. The AI processor is further configured to:execute the second subtask, and send a computing result to theprocessor. In this optional manner, computing power of the AI processorand computing power of the processor are collaborated, and the processorin the storage device can borrow the computing power of the AI processorto increase the computing power of the processor in the storage device.This accelerates service data read/write performed by the processor inthe storage device.

The processor is further configured to: execute the first subtask, andsend a computing result to the AI processor.

Optionally, the AI memory communicates with the memory in the storagedevice through a memory fabric (memory fabric).

Optionally, the AI memory communicates with an AI memory in anotherstorage device through a memory fabric.

Optionally, the memory in the storage device communicates with a memoryin another storage device through a memory fabric.

Optionally, the memory in the storage device communicates with an AImemory in another storage device through a memory fabric.

Unified scheduling of a memory and an AI memory in one storage device,unified scheduling of memories in different storage devices, and unifiedscheduling of AI memories in different storage devices can beimplemented through the memory fabric. This improves memory resourcescheduling and use efficiency of a storage system.

Optionally, the AI apparatus communicates with the processor through ahigh-speed serial computer expansion bus standard (peripheral componentinterconnect express, PCIe for short) bus.

Optionally, the processor is configured to: when an available capacityof the memory reaches a preset threshold, send a memory applicationrequest to the AI processor. The memory application request is used torequest the processor to obtain a segment of memory space from the AImemory through division and reserve the segment of memory space for theprocessor.

Optionally, the AI apparatus further includes an AI computing powerunit. The AI computing power unit is specifically configured to: obtainthe service data from the AI memory, and perform AI computing.Optionally, the AI processor is further configured to obtain a segmentof memory space from the AI memory through division and reserve thesegment of memory space for the processor in the storage device.

Optionally, the AI memory serves as a first level and the memory servesas a second level, to perform layered AI parameter caching. A priorityof the first level is higher than a priority of the second level. Inthis manner, the memory and the AI memory are layered, and an AIparameter is preferentially cached in the AI memory.

According to a second aspect, a distributed storage system is provided,including a plurality of storage devices. The plurality of storagedevices include a first storage device. The first storage deviceincludes a first processor, a first hard disk, and a first AI apparatus.The first AI apparatus communicates with the first processor through ahigh-speed interconnect network. The first processor is configured to:receive service data, and store the service data in the first hard disk.The first AI apparatus is configured to: send a data obtaining requestto the first processor to obtain the service data, and perform AIcomputing on the service data.

In the distributed storage system provided in this embodiment, two AIapparatuses in different storage devices exchange an AI parameter witheach other through a first network, and two storage apparatuses in thedifferent storage devices exchange service data with each other througha second network, to collaboratively perform AI computing based on theAI parameter and the service data. Storage capabilities and AI computingpower of a plurality of storage devices are converged, so that anoverall storage capability and AI computing power of the system can beincreased.

Optionally, the second storage device is configured to transmit theservice data to the first storage device through a second network. Thesecond storage device is further configured to transmit an AI parameterto the first storage device through a first network. The AI parameter isused to perform AI computing on the service data. The first network andthe second network are deployed, so that the AI parameter can betransmitted through the first network, and the service data can betransmitted through the second network. Therefore, for the system, anetwork resource used to forward an AI parameter can be separated from anetwork resource used to forward other data, so as to prevent originalstorage network resources of the storage device from being occupiedduring AI parameter transmission, thereby preventing networktransmission performance of the storage device from being deterioratedwhen a network bandwidth of the storage device is occupied in an AIcomputing process. In addition, the first network can be dedicated toforwarding AI-related service data. Therefore, based on the network,impact on networking of an existing service data center or a storagedevice cluster can be avoided.

Optionally, the second storage device is further configured to transmitother service data to the first storage device through the firstnetwork. Optionally, when a quantity of network resources of the secondnetwork is less than a specified storage network resource threshold, theother service data is transmitted between the first storage device andthe second storage device through the first network. In this optionalmanner, a new path is provided for service data exchange between storageapparatuses. When network resources of the second network areinsufficient, the first network is used to exchange the service data,and the first network may be used as a newly added path for forwardingthe service data. Therefore, the service data can be transmitted throughthe second network or the first network. This increases a networkbandwidth for transmitting the service data, shortens a delay ofexchanging the service data, accelerates service data exchange, andaccelerates AI computing.

Optionally, the second storage device is further configured to transmitanother AI parameter through the second network. The another AIparameter is used to perform AI computing on the other service data.

Optionally, when a quantity of network resources of the first network isless than a specified AI network resource threshold, the another AIparameter is transmitted between the first AI apparatus and the secondAI apparatus through the second network. In this optional manner, a newpath is provided for AI parameter exchange between AI apparatuses. Whennetwork resources of the first network are insufficient, the secondnetwork is used to exchange the AI parameter. This increases a networkbandwidth for transmitting the AI parameter, shortens a delay ofexchanging the AI parameter, accelerates AI parameter exchange, andaccelerates AI computing.

Optionally, the first AI apparatus includes a first AI processor and afirst AI memory, the second storage apparatus includes a secondprocessor, and the second AI apparatus includes a second AI memory. Thefirst AI processor is configured to: send a network resource request ofthe second network to the second processor, and send a memory RDMAaccess request to the second processor. The first AI processor isconfigured to: read other service data from the first memory, and sendthe other service data to the second AI apparatus through the secondnetwork. The second AI apparatus writes the other service data into thesecond memory. In this optional manner, memory pass-through between thefirst storage device and the second storage device is implemented, and amemory in the first storage device and a memory in the second storagedevice can exchange the service data with each other through RDMA.Therefore, processing overheads of the first processor and the secondprocessor can be avoided, and the service data is directly transmittedfrom the first memory to the second memory. This accelerates servicedata exchange, and improves service data exchange efficiency.

Optionally, the system further includes a management apparatus. Themanagement apparatus is configured to: receive a first job request;determine distribution of a to-be-trained dataset based on the first jobrequest, where the dataset includes the service data; and whendetermining that the service data is distributed on the first storagedevice, send a first computing request to the first storage device. Thefirst computing request is used to request the first AI apparatus toperform AI computing on the service data. In this optional manner, themanagement apparatus selects the first storage device in which theservice data is located, to provide AI computing, and the first storagedevice may obtain the service data through a first storage apparatus inthe first storage device, to perform AI computing. This prevents theservice data from moving across storage devices, avoids a delay causedby accessing another storage device to obtain the service data, shortensa delay of obtaining the service data, and accelerates AI computing.

Optionally, the first AI apparatus is configured to: obtain the servicedata from the first storage apparatus in the first storage device basedon the first computing request, and perform AI computing on the servicedata to obtain a first computing result. Optionally, before sending thefirst computing request to the first AI apparatus in the first storagedevice, the management apparatus is further configured to determine thata running status of the first storage device meets a specifiedcondition. In this optional manner, it can be ensured that the selectedfirst storage device is not occupied currently and can provide AIcomputing power, so as to avoid problems that device overheads areexcessively high and an AI computing task cannot be completed in timebecause an occupied storage device is selected to perform AI computing.

Optionally, the dataset further includes other service data. Themanagement apparatus is further configured to: when determining that theother service data is distributed on a second storage device in theplurality of storage devices, further determine a running status of thesecond storage device; and when the running status of the second storagedevice does not meet the specified condition, send a second computingrequest to the first storage device. A distance between the firststorage device and the second storage device is less than a specifieddistance threshold. The first AI apparatus is further configured to:obtain the other service data from the second storage device based onthe second computing request, and perform AI computing on the otherservice data to obtain a second computing result. In this optionalmanner, if a storage device in which the service data is located hasbeen occupied, the management apparatus can select a storage apparatusthat is close to the service data, to provide AI computing. Thisshortens a service data transmission distance, and reduces cross-nodeservice data movements.

Optionally, before determining whether the running status of the secondstorage device meets the specified condition, the management apparatusis further configured to: receive a second job request, and determine,based on the second job request, that a to-be-trained second dataset isdistributed on the second storage device.

According to a third aspect, a data processing method is provided. Themethod is applied to a distributed storage system. The distributedstorage system includes a plurality of storage devices. The plurality ofstorage devices include a first storage device. The method is used toimplement a function provided in any implementation of the secondaspect.

According to a fourth aspect, a data processing method is provided. Themethod is applied to a storage device. The storage device includes aprocessor, a hard disk, and an AI apparatus. The AI apparatuscommunicates with the processor through a high-speed interconnectnetwork. The method is used to implement a function provided in anyimplementation of the first aspect.

According to a fifth aspect, a computer-readable storage medium isprovided. The storage medium stores at least one instruction, and theinstruction is read by a storage device, so that the storage device isenabled to perform the data processing method provided in the fourthaspect or any optional manner of the fourth aspect.

According to a sixth aspect, a computer-readable storage medium isprovided. The storage medium stores at least one instruction, and theinstruction is read by a distributed storage system, so that thedistributed storage system is enabled to perform the data processingmethod provided in the third aspect or any optional manner of the thirdaspect.

According to a seventh aspect, a computer program product is provided.When the computer program product is run on a storage device, thestorage device is enabled to perform the data processing method providedin the fourth aspect or any optional manner of the fourth aspect.

According to an eighth aspect, a computer program product is provided.When the computer program product is run on a distributed storagesystem, the distributed storage system is enabled to perform the dataprocessing method provided in the third aspect or any optional manner ofthe third aspect.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is an architectural diagram of a distributed storage systemaccording to an embodiment of this application;

FIG. 2 is a schematic structural diagram of a storage device accordingto an embodiment of this application;

FIG. 3 is a schematic structural diagram of a storage device accordingto an embodiment of this application;

FIG. 4 is an architectural diagram of a distributed storage systemaccording to an embodiment of this application;

FIG. 5 is a schematic structural diagram of a storage device accordingto an embodiment of this application;

FIG. 6 is a schematic structural diagram of a storage device accordingto an embodiment of this application;

FIG. 7 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 8 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 9 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 10 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 11 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 12 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 13 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 14 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 15 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 16 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 17 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 18 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 19 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 20 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 21 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 22 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 23 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 24 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 25 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 26 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 27 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 28 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 29 is a schematic diagram of a data transmission procedureaccording to an embodiment of this application;

FIG. 30 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 31 is a flowchart of a data processing method according to anembodiment of this application;

FIG. 32 is a flowchart of a data processing method according to anembodiment of this application; and

FIG. 33 is a logical architectural diagram of a data processing methodaccording to an embodiment of this application.

DESCRIPTION OF EMBODIMENTS

The following explains terms related in this application.

Artificial intelligence (artificial intelligence, AI) is a theory, amethod, a technology, and an application system for simulating,extending, and expanding human intelligence through a computer or amachine controlled by a computer, to sense an environment, obtainknowledge, and obtain an optimal result by using the knowledge. In otherwords, the artificial intelligence is a comprehensive technology ofcomputer science, which produces, based on essence of intelligence, anew intelligent machine that can react in a manner similar to that ofthe human intelligence. The artificial intelligence is to study designprinciples and implementations of various intelligent machines, so thatthe machines have perception, inference, and decision-making functions.

Generally, AI implementation includes two phases: training andinference. Training means that a neural network model is obtainedthrough training by using a large quantity of labeled samples, so thatthe neural network model can have a specific function. Inference is alsoreferred to as prediction or inference, and means that a neural networkmodel obtained through training is used to infer various conclusionsbased on new service data.

A high-speed serial computer expansion bus standard (peripheralcomponent interconnect express, PCIe for short) bus is a local busdeveloped based on a peripheral component interconnect (peripheralcomponent interconnect, PCI for short) bus, and is used to connect aprocessor and at least one peripheral device. A peripheral device thatconforms to a PCIe bus standard is referred to as a PCIe device. ThePCIe bus has at least one PCIe interface. Each PCIe interface may be aslot in a physical form. Each PCIe interface is configured to connect toone PCIe device. Each PCIe device on the PCIe bus uses a serialinterconnect manner, and different PCIe devices on the PCIe bus mayperform service data transmission in a point-to-point manner. A PCIeprotocol is generally compatible with a technology related to the PCIprotocol and a PCI device.

A Huawei cache coherence system (Huawei cache-coherent system, HCCS) isa protocol standard for maintaining consistency of service data betweena plurality of ports (socket).

An AI parameter is a parameter in an AI model that is determined throughAI training. In other words, the AI model may be considered as afunction, and the AI parameter may be considered as a coefficient in thefunction. For example, if the AI model is a neural network, the AIparameter may be a weight of a convolution kernel in the neural network.For another example, if the AI model is a support vector machine, the AIparameter may be a support vector in the support vector machine. Foranother example, if the AI model is a linear regression model or alogistic regression model, the AI parameter may be a coefficient in thelinear regression model or the logistic regression model. Certainly, theenumerated AI model is merely an example. The AI model may alternativelybe another type of model, for example, a decision tree model, a randomforest model, a confidence network, a reinforcement learning model, atransfer learning model, or an inductive learning model, or acombination thereof. Correspondingly, the AI parameter may alternativelybe a parameter in the another type of model. A specific type of the AIparameter and a specific type of the AI model are not limited in theembodiments of this application.

An AI parameter adjustment process is critical to AI computing.Specifically, in an AI computing process, service data in a dataset isusually input into an AI model. The AI model performs inference andprediction on the service data based on an AI parameter, to obtain aprediction result. The AI parameter is adjusted based on an errorbetween the prediction result and an actual result, so that the error isreduced when inference and prediction is performed next time based on anadjusted AI parameter. AI parameter adjustment is cyclically performed,so that the AI parameter can be gradually accurate through adjustment.When training ends, an AI model including an accurate parameter may beused to implement accurate inference and prediction, for example,accurately perform facial recognition on a face image. Some embodimentsof this application can greatly accelerate AI parameter adjustment. Forexample, by performing the following embodiment in FIG. 17 , computingpower of a processor in a storage device may be borrowed to greatlyincrease AI computing power. Therefore, AI parameter computing can beaccelerated through stronger AI computing power. For another example, byperforming the following embodiment in FIG. 26 , a back-end storagenetwork resource (that is, a network resource of a second network) maybe borrowed to greatly increase a quantity of available networkresources for AI computing. Therefore, AI parameter transmission can beaccelerated through more network resources, thereby accelerating AIparameter adjustment and update.

Direct memory access (direct memory access, DMA for short) is atechnology for transmitting data between a memory and a peripheraldevice. Through DMA, the peripheral device may directly write data intothe memory or access data in the memory without a need of participationof a central processing unit (central processing unit, CPU). Forexample, the peripheral device may apply to a processor for memoryspace, and the processor allocates a memory buffer (buffer) to theperipheral device. Then, the peripheral device may directly write datainto the memory buffer, or directly read data from the memory buffer.The peripheral device mentioned herein may be referred to as a DMAcontroller in a DMA related protocol.

In an embodiment in FIG. 9 of this application, a hard disk may be theperipheral device described above, and an AI memory may be the memorydescribed above. The hard disk may directly access the AI memory throughDMA without a need of participation of a processor and an AI processorin a storage device, thereby accelerating access of the AI memory.

Remote direct memory access (remote direct memory access, RDMA forshort) is a technology for directly transmitting data from a memory inone device to a memory in another device. RDMA provides messagequeue-based point-to-point communication. Each application may obtain amessage of the application through a message queue, to avoid CPUintervention. In an RDMA related protocol, an apparatus that performs anRDMA operation (RDMA verb) may be referred to as a remote direct memoryaccess network interface controller (RDMA network interface controller,RNIC for short). The RDMA operation may include a storage operation(memory verb) and a message operation (messaging verb). The storageoperation may be used to transmit data. The storage operation includesan RDMA write operation, an RDMA read operation, and an atomicoperation. For example, a process of the RDMA write operation mayinclude: A CPU of a destination device sends a virtual address of amemory area and permission information of the memory area to a CPU of asource device. The CPU of the source device stores to-be-written datainto a memory area of the source device. The CPU of the source devicegenerates an RDMA write instruction based on a virtual address of thememory area of the source device, the virtual address of the memory areaof the destination device, and the permission information of the memoryarea of the destination device, and adds the RDMA write instruction to atransmit queue of the RNIC. Then, the source device notifies, through adoorbell mechanism, the RNIC to execute the instruction in the transmitqueue. Next, the RNIC reads the instruction from the transmit queue toobtain the RDMA write instruction, and performs the RDMA write operationaccording to the RDMA write instruction. The RDMA read operation mayinclude: The CPU of the destination device stores data in the memoryarea, and sends the virtual address of the memory area and thepermission information of the memory area of the destination device tothe CPU of the source device. The CPU of the source device receives thevirtual address and the permission information, generates an RDMA readinstruction based on the virtual address of the memory area of thesource device, the virtual address of the memory area of the destinationdevice, and the permission information of the memory area of thedestination device, and adds the RDMA read instruction to the transmitqueue. Then, the source device notifies, through the doorbell mechanism,the RNIC to execute the instruction in the transmit queue. The RNICreads the instruction from the transmit queue to obtain the RDMA readinstruction, and performs the RDMA read operation according to the RDMAwrite instruction. The message operation may be used to control amessage. The message operation may include an RDMA sending operation andan RDMA receiving operation.

In the embodiment in FIG. 26 of this application, a first networkinterface (which may be referred to as an AI parameter networkinterface) in an AI apparatus may be an RNIC, and may access an AImemory in another storage device through RDMA, or write an AI parameterinto an AI memory in another storage device through RDMA. In anembodiment in FIG. 22 of this application, a second network interface(which may be referred to as a back-end switching interface) in astorage apparatus may be an RNIC, and may access a memory in anotherstorage device through RDMA, or write service data into a memory inanother storage device through RDMA.

A distributed storage system provided in this application is describedbelow by using an example.

FIG. 1 is an architectural diagram of a distributed storage systemaccording to an embodiment of this application. A distributed storagesystem 1 includes a plurality of storage devices. The plurality ofstorage devices include a first storage device 10.

The first storage device 10 is configured to store service data andprocess AI computing. The first storage device 10 includes a firststorage apparatus 100 and a first AI apparatus 101. The first AIapparatus 101 is disposed inside the first storage device 10. The firststorage apparatus 100 may include a first processor 1001 and a firsthard disk 1002.

The first AI apparatus 101 communicates with the first processor 1001through a high-speed interconnect network.

The first processor 1001 is configured to: receive service data, andstore the service data in the first hard disk 1002.

The first AI apparatus 101 is configured to: send a data obtainingrequest to the first processor 1001 to obtain the service data, andperform AI computing on the service data.

Optionally, the data obtaining request includes a first data obtainingrequest.

The first processor 1001 is configured to: in response to the first dataobtaining request, obtain the service data from the first hard disk1002, and send the service data to the first AI apparatus 101.

Optionally, the data obtaining request includes a second data obtainingrequest.

The first processor 1001 is configured to send metadata of the servicedata to the first AI apparatus 101 in response to the second dataobtaining request. The metadata is used to indicate an address of theservice data.

The first AI apparatus 101 is configured to: when the metadata indicatesthat the service data is located in the first storage device, send afirst data access request to the first hard disk 1002. The first dataaccess request includes the metadata.

The first hard disk 1002 is configured to: obtain the service data basedon the metadata, and write the service data into the first AI apparatus101 through DMA.

Optionally, the plurality of storage devices further include a secondstorage device 11. The first storage device 10 and the second storagedevice 11 are two different storage devices. The first storage device 10communicates with the second storage device 11 through a network.

The second storage device 11 is configured to store service data andprocess AI computing. The second storage device 11 includes a secondstorage apparatus 110 and a second AI apparatus 111. The second AIapparatus 111 is disposed inside the second storage device 11.

The second storage device 11 includes a second processor 1101 and asecond hard disk 1102. The second AI apparatus communicates with thesecond processor 1101 through the high-speed interconnect network.

The second processor 1101 is configured to: receive service data, andstore the service data in the second hard disk 1102.

The second AI apparatus is configured to: send a data obtaining requestto the second processor 1101 to obtain the service data, and perform AIcomputing on the service data.

Optionally, the data obtaining request includes a third data obtainingrequest.

The first processor 1001 is configured to send metadata of the servicedata to the first AI apparatus 101 in response to the third dataobtaining request. The metadata is used to indicate an address of theservice data.

The first AI apparatus 101 is configured to: when the metadata indicatesthat the service data is located in the second storage device 11 in theplurality of storage devices, send a second data access request to thesecond storage device 11, where the second data access request includesthe metadata; and receive the service data that is sent by the secondstorage device 11 in response to the second data access request.

Optionally, the second storage device 11 is configured to transmit theservice data to the first storage device 10 through a second network 12.The second storage device 11 is further configured to transmit an AIparameter to the first storage device 10 through a first network 13. TheAI parameter is used to perform AI computing on the service data.

The first network 13 may be referred to as an AI parameter network, andis used to transmit an AI parameter between the first storage device 10and the second storage device 11. One or more network devices such as arouter and a switch may be disposed in the first network 13.Specifically, the network device 13 may be separately connected to thefirst AI apparatus 101 and the second AI apparatus 111, and an AIparameter is transmitted between the first AI apparatus 101 and thesecond AI apparatus 111 through the network device. For example, thenetwork device may be connected to a first network interface in thefirst AI apparatus 101 and a first network interface in the second AIapparatus 111.

The second network 12 may be referred to as a back-end storage network,and is used to transmit service data between the first storage device 10and the second storage device 11. One or more network devices such as arouter and a switch may be disposed in the second network 12.Specifically, the network device may be connected to the first storageapparatus 100 and the second storage apparatus 110, and service data istransmitted between the first storage apparatus 100 and the secondstorage apparatus 110 through the network device. For example, thenetwork device may be connected to a second network interface in thefirst storage apparatus 100 and a second network interface in the secondstorage apparatus 110.

The first network 13 and the second network 12 are deployed, so that theAI parameter can be transmitted through the first network 13, and theservice data can be transmitted through the second network 12.Therefore, for the storage system 1, a network resource used to forwardan AI parameter can be separated from a network resource used to forwardother data, so as to prevent original storage network resources of astorage device from being occupied during AI parameter transmission,thereby preventing network transmission performance of the storagedevice from being deteriorated when a network bandwidth of the storagedevice is occupied in an AI computing process. In addition, the firstnetwork 13 can be dedicated to forwarding AI-related service data.Therefore, based on the network, impact on networking of an existingservice data center or a storage device cluster can be avoided.

Optionally, a third network 14 may be further deployed for thedistributed storage system 1. The third network 14 may be referred to asa service network. The first storage device 10 may communicate with thesecond storage device 11 through the third network 14. The third network14 may include one or more network devices. The network device may beseparately connected to a service interface in the first storageapparatus 100 and a service interface in the second storage apparatus110. It should be understood that, in this embodiment, the first storagedevice 10 and the second storage device 11 in the distributed storagesystem 1 are merely used as examples for description. A person skilledin the art may learn that the distributed storage system 1 may includemore or fewer storage devices. For example, the distributed storagesystem 1 may include only the first storage device 10 and the secondstorage device 11. Alternatively, the distributed storage system 1 mayinclude dozens or hundreds of storage devices or more storage devices.In this case, the distributed storage system 1 further includes otherstorage devices in addition to the first storage device 10 and thesecond storage device 11. A quantity of storage devices is not limitedin this embodiment. Particularly, as a requirement on AI computing powerincreases, a scale of the storage system 1 in this embodiment may beaccordingly enlarged. For example, the storage system 1 may includemillions of storage devices, so that overall computing power of thesystem 1 is enhanced to a million level or higher.

In addition, the first storage device 10 may be generally any one of theplurality of storage devices in the distributed storage system 1, andthe second storage device 11 may be generally any one of the pluralityof storage devices in the distributed storage system 1 other than thefirst storage device 10. In this embodiment, the first storage device 10and the second storage device 11 are merely used as examples fordescription.

In the distributed storage system provided in this embodiment, two AIapparatuses in different storage devices exchange an AI parameter witheach other through the first network, and two storage apparatuses in thedifferent storage devices exchange service data with each other throughthe second network, to collaboratively perform AI computing based on theAI parameter and the service data. Storage capabilities and AI computingpower of a plurality of storage devices are converged, so that anoverall storage capability and AI computing power of the system can beincreased. The first network may be a PCIe high-speed interconnectnetwork, a fibre channel (FC), a SCSI, the Ethernet, RDMA, a memoryfabric, or the like. The second network may be a PCIe high-speedinterconnect network, an FC, a SCSI, the Ethernet, RDMA, a memoryfabric, or the like.

Optionally, the second storage device 110 is further configured totransmit other service data to the first storage device 10 through thefirst network 13. The second storage device 110 is further configured totransmit another AI parameter through the second network 12. The anotherAI parameter is used to perform AI computing on the other service data.

For example, when a quantity of resources of the second network is lessthan a specified storage network resource threshold, the other servicedata may be transmitted between the first storage apparatus 100 and thesecond storage apparatus 110 through the first network 13. Specifically,the first storage apparatus 100 includes the first processor and a firstmemory, and the second storage apparatus 110 includes the secondprocessor and a second memory. The first AI apparatus 101 includes afirst AI processor and a first AI memory. The first AI processor sends anetwork resource request of the first network to a second AI processor,and sends a memory RDMA access request to the second processor. Thefirst AI apparatus 101 reads the other service data from the firstmemory, and sends the other service data to the second AI apparatus 111through the first network 13. The second AI apparatus 111 writes theother service data into the second memory.

Optionally, when a quantity of resources of the first network is lessthan a specified AI network resource threshold, the another AI parameteris transmitted between the first AI apparatus 101 and the second AIapparatus 111 through the second network 12. Specifically, the first AIprocessor sends a network resource request of the second network to thesecond processor. The first AI processor is configured to: obtain theanother AI parameter from the first AI memory, and transmit the anotherAI parameter to a second AI memory through the second network 12 throughRDMA.

Optionally, the system 1 further includes a management apparatus. Themanagement apparatus is a software module. Functionally, the managementapparatus is configured to manage all the storage devices in thedistributed storage system 1, for example, schedule the storage devicesto process AI computing. The management apparatus may be an independentdevice, for example, may be a host, a server, a personal computer, oranother device. Alternatively, the management apparatus may be locatedin any storage device. When the management apparatus is an independentdevice, the management apparatus may communicate with any storage devicein the system through a network.

A specific example is used below to describe how the managementapparatus schedules the storage devices. When receiving a first jobrequest, the management apparatus determines distribution of ato-be-trained dataset based on the first job request. When determiningthat the service data in the dataset is distributed on the first storagedevice 10, the management apparatus sends a first computing request tothe first AI apparatus 101 in the first storage device 10. The first AIapparatus 101 is configured to: obtain the service data from the firststorage device 10 based on the first computing request, and perform AIcomputing on the service data to obtain a first computing result. Thisis a near-data AI computing manner, to be specific, an AI apparatus in astorage device in which the dataset is located is preferentiallyselected to perform AI computing on the dataset. This can avoidcross-network data transmission to some extent, and save networkresources. It should be understood that, in some cases, a running statusof the storage device in which the dataset is located may be poor, andconsequently it is not appropriate for the storage device to receive anAI computing task. Therefore, in this embodiment, before sending thefirst computing request to the first AI apparatus 101 in the firststorage device 10, the management apparatus is further configured todetermine that a running status of the first storage device 10 meets aspecified condition.

Therefore, when the running status of the first storage device 10 doesnot meet the specified condition, a storage device whose distance fromthe first storage device 10 is less than a specified distance threshold,for example, the second storage device 11, may be selected to perform AIcomputing.

In some embodiments, the to-be-trained dataset may further include otherservice data.

The management apparatus is further configured to: when determining thatthe other service data is distributed on the second storage device 11 inthe plurality of storage device, further determine a running status ofthe second storage device 11; and when the running status of the secondstorage device 11 does not meet a specified condition, send a secondcomputing request to the first storage device 10. A distance between thefirst storage device 10 and the second storage device 11 is less than aspecified distance threshold.

The first AI apparatus 101 is further configured to: obtain the otherservice data from the second storage device 11 based on the secondcomputing request, and perform AI computing on the other service data toobtain a second computing result.

In some possible embodiments, the storage system 1 may further include ahost (not shown in FIG. 1 ). The host is configured to collaborate withthe first storage device 10 and the second storage device 11 to performAI computing. The host may be an AI server, or may be an applicationserver. The AI server is configured to execute an AI computing task. Forexample, the AI server may perform model training or service dataprocessing. In a running process of the AI server, the AI server mayexchange a model parameter with the first storage device 10 and/or thesecond storage device 11. The application server may receive a modeltraining instruction or a service data processing instruction from aterminal, and perform resource scheduling, AI algorithm storage, AIalgorithm updating, or the like in an AI computing process. The host mayprovide AI computing power through a general-purpose processor such as aCPU, or may provide AI computing power through a processor such as agraphics processing unit (English: graphics processing unit, GPU forshort), a neural processing unit (English: neural-network processingunit, NPU for short), a tensor processing unit (English: tensorprocessing unit, TPU for short), or a field-programmable gate array(English: field-programmable gate array, FPGA for short). The host maybe a physical device, or may be a virtual device such as an elasticcloud server leased from a cloud platform. In the AI computing process,the first storage device 10 and the second storage device 11 mayundertake a primary computing task, and the host undertakes a secondarycomputing task; or the host undertakes a primary computing task, and thefirst storage device 10 and the second storage device 11 undertake asecondary computing task.

In some possible embodiments, the second network 12 may include aninternal network device and an external network device.

The internal network device may be disposed inside the first storagedevice 10 and/or the second storage device 11. The internal networkdevice may be connected to components of the first storage device 10and/or the second storage device 11 through the high-speed interconnectnetwork. The internal network device may be a bus, which may bespecifically a serial interface bus, for example, any one of a PCIE bus,an HCCS bus, the Ethernet, an IB, and an FC. The first storage device 10and/or the second storage device 11 may back up stored data through theinternal network device, to protect service data and an AI parameter.

The external network device may be disposed outside the first storagedevice 10 and the second storage device 11. The external network deviceis connected to the first storage device 10 and the second storagedevice 11 through a network. The external network device may be aswitch, a router, or the like. The first storage device 10 and thesecond storage device 11 each may provide a data storage service for anapplication through the external network device, for example, a filestorage service, an object storage service, or a block storage service.The external network device may be a switch, a router, or the like.

In some possible embodiments, an internal network device and an externalnetwork device may be disposed in the first network 13. The internalnetwork device and the external network device in the first network arerespectively similar to the internal network device and the externalnetwork device in the second network 12. Details are not describedherein again.

In some possible embodiments, the internal network device and theexternal network device in the second network 12 and/or the firstnetwork 13 may be deployed together as a whole.

In some possible embodiments, the quantity of storage devices in thesystem 1 may be dynamically increased or reduced based on a requirementon AI computing power, to implement elastic capacity expansion orelastic capacity shrinking. Specifically, the storage system 1 may beconsidered as a resource pool, the storage device may be considered as aresource in the resource pool, and the resource can provide both AIcomputing power and a storage capability. If the storage system 1 hasinsufficient AI computing power, for example, if load of each AIapparatus has exceeded a load threshold, a storage device may be addedto the storage system 1, and an existing storage device and the newlyadded storage device in the storage system 1 jointly provide AIcomputing power, to implement capacity expansion and relieve load of anAI apparatus in a single storage device. If the storage system 1 isidle, the quantity of storage devices in the storage system 1 may bereduced, to implement capacity shrinking.

A storage device provided in this application is described below byusing an example.

FIG. 2 is a schematic structural diagram of a storage device accordingto an embodiment of this application. A storage device 2 may be thefirst storage device 10 in the system 1 shown in FIG. 1 , or may be thesecond storage device 11 in the system 1 shown in FIG. 1 , or certainlymay be another storage device not shown in FIG. 1 .

As shown in FIG. 2 , the storage device 2 includes a storage apparatus20 and an AI apparatus 21.

The storage apparatus 20 includes a hard disk 201 and a processor 202.

The processor 202 is configured to: receive service data, and store theservice data in the hard disk 201. The service data may be used as aninput for AI computing. For example, the service data may be a sampleset used for model training, or a dataset used for inference andprediction. For example, in a model training phase, the service data mayinclude one or more of a sample image, a sample video, a sample speech,or a sample text. The sample image may be used for training to obtain animage recognition model, the sample video may be used for training toobtain a video recognition model or a target analysis model, the sampletext may be used for training to obtain a semantic recognition model, orthe like. For another example, in a model application phase, the servicedata may be a to-be-recognized image, video, speech, or text, which maybe used for image recognition, facial recognition, speech recognition,or text understanding. In some embodiments, the processor 202 mayprovide a computing resource, and the storage apparatus 20 may processthe service data through the processor 202.

The hard disk 201 may provide storage space. The hard disk 201 isconfigured to store the service data. The hard disk 201 may be a smartdisk. The smart disk has components such as a processor, a memory, and aDMA controller. Therefore, the smart disk can undertake more functionsthan a conventional hard disk, for example, can transmit data throughDMA.

The AI apparatus 21 is configured to: send a data obtaining request tothe processor 202 to obtain the service data, and perform AI computingon the service data. The AI apparatus is disposed inside the storagedevice 2. The AI apparatus 21 provides an AI computing capability forthe storage device 2. By running the AI apparatus 21, AI computingmethods in the following method embodiments may be performed. The AIapparatus 21 may be in a form of a chip or another physical component.For example, the AI apparatus 21 may be a training chip configured toconstruct a neural network model, or may be an inference chip configuredto perform inference through a neural network model. The data obtainingrequest is used to request to obtain the service data stored in the harddisk 201.

The AI apparatus 21 communicates with the processor 202 through ahigh-speed interconnect network 22. The high-speed interconnect network22 is used for data communication between the AI apparatus 21 and theprocessor 202. The high-speed interconnect network 22 may be any one ofPCIe, a memory fabric, the high-speed Ethernet, an HCCS, an infiniband(infiniband, IB), and a fibre channel (fiber channel, FC). Thehigh-speed interconnect network 22 may be in a form of a bus. In thiscase, the high-speed interconnect network 22 may also be referred to asa high-speed interconnect switch or a high-speed interconnect bus. Forexample, the storage device 2 may include a high-speed interconnect bus,and the AI apparatus 21 and the processor 202 may be connected to thehigh-speed interconnect bus, to access the high-speed interconnectnetwork. In some possible embodiments, the AI apparatus 21 may include ahigh-speed interconnect network interface, and the processor 202 mayinclude a high-speed interconnect network interface. The AI apparatus 21is connected to the high-speed interconnect bus through the high-speedinterconnect network interface of the AI apparatus 21, and the processor202 is connected to the high-speed interconnect bus through thehigh-speed interconnect network interface of the processor 202. Thehigh-speed interconnect network interface may be a serial bus interface.Specifically, the high-speed interconnect network interface may be anyone of a PCIE interface, an HCCS interface, an Ethernet interface, an IBinterface, and an FC interface. If there are different types ofhigh-speed interconnect network interfaces, the service data may betransmitted between the AI apparatus 21 and the processor 202 atdifferent speeds. Experiments show that a service data loading rate canbe increased by 2 to 10 times based on different types of interfaces. Inaddition, the storage device 2 may supply electric energy to the AIapparatus 21 through the high-speed interconnect network interface ofthe AI apparatus 21. It should be understood that the high-speedinterconnect bus is merely an example of the high-speed interconnectnetwork 22. The high-speed interconnect network 22 may not be thehigh-speed interconnect bus, but another bus having a memorypass-through function. A specific type of the high-speed interconnectnetwork 22 is not limited in this embodiment.

In the storage device, there may be a plurality of components thatcommunicate with each other through the high-speed interconnect network22. For example, a memory 203 and an AI memory 210 in the storage devicemay communicate with each other through the high-speed interconnectnetwork 22. For another example, the hard disk 201 and the AI memory 210may communicate with each other through the high-speed interconnectnetwork 22. For another example, the hard disk 201 and the memory 203may communicate with each other through the high-speed interconnectnetwork 22. For another example, the processor 202 and an AI processor213 in the storage device may communicate with each other through thehigh-speed interconnect network 22. In addition, if there are aplurality of memories 203, different memories may communicate with eachother through the high-speed interconnect network 22. If there are aplurality of AI memories 210, different AI memories may communicate witheach other through the high-speed interconnect network 22. Certainly,the foregoing components that communicate with each other are merelyused as examples for description. All different components that areconnected through the high-speed interconnect network 22 may communicatewith each other through the high-speed interconnect network 22. Anexecution body of performing communication through the high-speedinterconnect network 22 is not limited in this embodiment.

There may be a plurality of manners of performing communication throughthe high-speed interconnect network 22. The following two manners areused as examples for description.

Manner 1: Communication is performed through DMA. For the DMA-basedcommunication, refer to the following embodiment in FIG. 9 and theforegoing description of the DMA technical principle. Details are notdescribed herein again.

Manner 2: Communication is performed through a memory fabric (memoryfabric).

Referring to FIG. 3 , the storage device may include a memory fabric 23.The memory fabric 23 may integrate a function of the high-speedinterconnect network 22 and a function of a first network interface 211.The AI memory 210 may be connected to the memory 203 through the memoryfabric 23, and the AI memory 210 may communicate with the memory 203through the memory fabric 23. In addition, the memory fabric 23 may alsoimplement communication between memories across devices. For example,the AI memory 210 may also be connected to an AI memory in anotherstorage device through the memory fabric 23, and the AI memory 210 mayalso communicate with the AI memory in the another storage devicethrough the memory fabric 23. For another example, the memory 203 mayalso be connected to a memory in another storage device through thememory fabric 23, and the memory 203 may also communicate with thememory in the another storage device through the memory fabric 23. TheAI memory 210 may also be connected to a memory in another storagedevice through the memory fabric 23, and the AI memory 210 may alsocommunicate with the memory in the another storage device through thememory fabric 23. For another example, the memory 203 may also beconnected to an AI memory in another storage device through the memoryfabric 23, and the memory 203 may also communicate with the AI memory inthe another storage device through the memory fabric 23. In addition, insome embodiments, the memory fabric 23 may further integrate a functionof a second network interface 204, or integrate functions of othercomponents in the storage device that are configured to perform networkcommunication. In this case, the components other than the memory in thestorage device may communicate with each other through the memory fabric23.

For the manner of communication performed through the memory fabric 23,in some possible embodiments, memories that are connected to each otherthrough the memory fabric 23 may constitute a memory resource pool, andcorresponding addresses are assigned to memory space of the memories inthe memory resource pool in a unified manner, so that the memory spaceof all the memories in the memory resource pool belongs to a sameaddress range. After service data or an AI parameter is written into anymemory space of any memory in the memory resource pool, metadata of theservice data or the AI parameter may be obtained based on an address ofthe memory space, and then corresponding memory space may be obtainedthrough addressing through the memory fabric 23 based on the metadata,to read the service data or the AI parameter from the memory space. Anaddressing object may be local memory space of the storage device, ormay be memory space of a remote storage device.

For example, referring to FIG. 4 , a first memory, a second memory, afirst AI memory, and a second AI memory may communicate with each otherthrough a memory fabric. The first memory, the second memory, the firstAI memory, and the second AI memory may constitute a memory resourcepool of the system 1. For example, when a first AI processor or a firstprocessor needs to obtain an AI parameter, the first AI processor or thefirst processor determines, based on metadata of the AI parameter, anaddress of memory space in which the AI parameter is located in thememory resource pool, and obtains the memory space through addressingthrough the memory fabric.

Unified scheduling of a memory and an AI memory in one storage device,unified scheduling of memories in different storage devices, and unifiedscheduling of AI memories in different storage devices can beimplemented through the memory fabric. This improves memory resourcescheduling and use efficiency of a storage system.

That the AI apparatus 21 communicates with the storage apparatus 20through the high-speed interconnect network 22 may bring at least thefollowing effects: In an AI computing process, when the AI apparatus 21needs to load service data required for AI computing, the service datacan be sent from the storage apparatus 20 to the AI apparatus 21 throughthe high-speed interconnect network 22, and loaded to the AI memory 210in the AI apparatus 21. When the AI apparatus 21 needs to store acomputing result of AI computing, the computing result can be sent fromthe AI apparatus 21 to the hard disk 201 in the storage apparatus 20through the high-speed interconnect network 22, so that the computingresult can be stored in the hard disk 201. In this way, for a loadingpath and a storage path, the service data is transmitted inside thestorage device 2 through the high-speed interconnect network 22, insteadof being transmitted outside the storage device 2 through forwarding byone or more network links and one or more switches. Therefore, a pathfor transmitting the service data can be greatly shortened, so that theAI apparatus 21 can obtain the service data nearby that is stored in thestorage apparatus 20, and store the computing result in the storageapparatus 20 nearby. In addition, the service data is transmitted basedon the high-speed interconnect network 22 instead of a common remotenetwork according to a TCP/IP protocol. A transmission speed of thehigh-speed interconnect network 22 is usually far faster than atransmission speed of the remote network, and a cumbersome procedure ofestablishing a network communication connection such as three-wayhandshake required for network communication can be avoided. Therefore,the AI apparatus 21 can quickly load the service data required for AIcomputing from the storage apparatus 20, and quickly store the computingresult of AI computing in the hard disk 201 in the storage apparatus 20.Experiments show that a delay can be shortened by more than 30% duringeach time of service data loading.

It should be understood that the storage device 2 may include one ormore AI apparatuses 21. A quantity of AI apparatuses 21 may be set basedon a structure, space, or a requirement of the storage device 2. Thequantity of AI apparatuses 21 included in the storage device 2 is notlimited in this embodiment.

If the storage device 2 includes a plurality of AI apparatuses 21, theplurality of AI apparatuses 21 may communicate with each other throughthe high-speed interconnect network 22, for example, exchange an AIparameter through the high-speed interconnect network 22. Shapes andstructures of different AI apparatuses 21 in the storage device 2 may bethe same, or may be slightly different based on an actual requirement.

The storage device 2 may be a distributed controlled storage device 2,or may be a centralized controlled storage device 2. A form of thestorage device 2 may be but is not limited to a storage server, astorage array, or another dedicated storage device 2. The storage device2 may be but is not limited to running in a cloud environment, an edgeenvironment, or a terminal environment.

The storage apparatus and the AI apparatus are disposed in the storagedevice provided in this embodiment, so that the storage device canprovide an AI computing capability through the AI apparatus and providea service data storage capability through the storage apparatus, therebyimplementing convergence of storage and AI computing power. The AIparameter and the service data are transmitted inside the storage devicethrough the high-speed interconnect network without a need of beingforwarded through an external network. Therefore, a path fortransmitting the service data and the AI parameter is greatly shortened,and the service data can be loaded nearby, thereby accelerating loading.In addition, the AI apparatus can borrow a computing resource of thestorage apparatus to process the service data, so that computing powerof the AI apparatus is increased through computing power collaboration,thereby accelerating AI computing.

In the foregoing embodiment, the AI apparatus 21 may send a dataobtaining request to the processor 202 to obtain the service data fromthe hard disk 201, to perform AI computing based on the service data. Inthis manner, the AI apparatus 21 can obtain the service data from thestorage device. This avoids communication overheads caused by requestingthe service data from a remote storage device through a network, andshortens a delay of obtaining the service data.

The AI apparatus 21 may specifically obtain the service data in aplurality of implementations. Correspondingly, there may be a pluralityof cases of the data obtaining request sent by the AI apparatus, anddata obtaining requests in different cases may be slightly different.The following uses Manner 1, Manner 2, and Manner 3 as examples fordescription. For distinguishing description, a data obtaining request inManner 1 is referred to as a first data obtaining request, a dataobtaining request in Manner 2 is referred to as a second data obtainingrequest, and a data obtaining request in Manner 3 is referred to as athird data obtaining request.

Manner 1: The storage device may transmit the service data stored in thehard disk 201 to the AI apparatus 21 through the processor 202.Specifically, the AI apparatus 21 is further configured to send thefirst data obtaining request to the processor 202. The first dataobtaining request is used to request to obtain the service data. Theprocessor 202 is further configured to: receive the first data obtainingrequest; and in response to the first data obtaining request, obtain theservice data from the hard disk 201, and send the service data to the AIapparatus 21.

Manner 2: The storage device may transmit the service data stored in thehard disk 201 to the AI apparatus 21 through DMA. Specifically, the AIapparatus 21 is further configured to send the second data obtainingrequest to the processor 202. The second data obtaining request is usedto request to obtain the service data. The processor 202 is furtherconfigured to: receive the second data obtaining request, and sendmetadata of the service data to the AI apparatus 21 in response to thesecond data obtaining request. The AI apparatus 21 may determine, basedon the metadata, whether the service data is located in the storagedevice. When the metadata indicates that the service data is located inthe storage device, the AI apparatus 21 sends a first data accessrequest to the hard disk 201. The first data access request includes themetadata. For example, the first data access request may be a DMArequest. The hard disk 201 is configured to: obtain the service databased on the metadata, and write the service data into the AI apparatus21 through DMA. The metadata is used to indicate an address of theservice data. In Manner 2, the service data is located in the storagedevice.

The hard disk 201 may communicate with the AI apparatus 21. Therefore,the foregoing procedure of interaction based on a second data accessrequest can be implemented. In an example embodiment, the hard disk 201may include a high-speed interconnect network interface. The hard disk201 may be connected to the AI apparatus 21 through the high-speedinterconnect network interface. The AI apparatus 21 may performread/write in the hard disk 201 through a controller or a drivercorresponding to the high-speed interconnect network interface. Forexample, the high-speed interconnect network interface of the hard disk201 may be a serial attached SCSI (Serial Attached SCSI, SAS) interface.The AI apparatus 21 may communicate with the SAS interface through anSAS controller, to perform read/write in the hard disk 201. Certainly,the high-speed interconnect network interface of the hard disk 201 mayalternatively be an interface other than the SAS interface, for example,an advanced technology attachment (advanced technology attachment, ATA)interface, an integrated drive electronics (integrated driveelectronics, IDE) interface, an FC, or a small computer system interface(small computer system interface, SCSI).

Manner 3: The storage device may transmit service data stored in anotherstorage device to the AI apparatus 21 through RDMA. Specifically, the AIapparatus 21 is further configured to send the third data obtainingrequest to the processor 202. The third data obtaining request is usedto request to obtain the service data. The processor 202 is furtherconfigured to: receive the third data obtaining request, and sendmetadata of the service data to the AI apparatus 21 in response to thethird data obtaining request. The AI apparatus 21 may determine, basedon the metadata of the service data, whether the service data is locatedin the another storage device. When the metadata indicates that theservice data is located in the another storage device, the AI apparatus21 sends a second data access request to the another storage device. Thesecond data access request includes the metadata. For example, thesecond data access request may be an RDMA request. The another storagedevice may obtain the service data in response to the second data accessrequest based on the metadata, and write the service data into the AIapparatus 21 through RDMA. The metadata is used to indicate an addressof the service data. In Manner 3, the service data is located in theanother storage device.

The AI apparatus 21 may include the AI memory 210. The AI memory 210 isconfigured to cache an AI parameter and/or service data that is justused or cyclically used by an AI computing power unit 212 or the AIprocessor 213. If the AI computing power unit 212 or the AI processor213 needs to use the service data again, the AI computing power unit 212or the AI processor 213 may directly invoke the service data from the AImemory 210, to avoid repeated access, thereby reducing a waiting time ofthe AI computing power unit 212 or the AI processor 213, and improvingcomputing efficiency. In an AI computing process, the AI memory 210 maycache input service data, an intermediate result, or a final result ofAI computing. For example, the AI memory 210 may be a high-speed cache.The AI memory 210 may include a high-speed random access memory, or mayinclude a nonvolatile memory, for example, at least one magnetic diskstorage device, a flash memory device, or a universal flash storage(universal flash storage, UFS).

The AI apparatus 21 is configured to send a first data access request tothe hard disk 201. The first data access request includes the metadata.The metadata (metadata) is used to indicate an address of the servicedata. For example, the metadata may include a track identifier and asector identifier. The track identifier is used to identify a track onwhich the service data is located in the hard disk. For example, thetrack identifier may be a track ID. The sector identifier is used toidentify a sector in which the service data is located in the hard disk.For example, the sector identifier may be a sector ID. The hard disk mayfind a corresponding track based on the track identifier, find acorresponding sector on the track based on the sector identifier, andthen read data in the sector, to obtain the service data requested bythe AI apparatus 21 through the first data access request. It should beunderstood that the metadata may indicate the address in a plurality ofmanners. For example, the metadata may indicate the address of theservice data by indicating a start address and a length of the servicedata. For another example, the metadata may indicate the address of theservice data by indicating a start address and an end address of theservice data. This is not limited in this embodiment.

The hard disk 201 is configured to: obtain the service data based on themetadata, and write the service data into the AI memory 210 through DMA.

Optionally, the storage device 2 communicates with another storagedevice. For example, the storage device 2 may be the first storagedevice 10 in the system 1 shown in FIG. 1 , the another storage devicemay be the second storage device 11 in the system 1 shown in FIG. 1 ,and the first storage device 10 may communicate with the second storagedevice 11.

In a possible implementation, the AI apparatus 21 may include the firstnetwork interface 211. The first network interface 211 may be in a formof a network interface card. The first network interface 211 may providea network communication capability. The AI apparatus 21 can communicatewith another storage device 2 through the first network interface 211,for example, exchange an AI parameter with the another storage device 2through the first network interface 211. Optionally, the first networkinterface 211 may support a remote direct memory access (remote directmemory access, RDMA for short) function, and the first network interface211 may be a remote direct memory access network interface controller(RDMA network interface controller, RNIC for short).

In an example application scenario, in a training process or in aninference process, if the AI apparatus 21 obtains an AI parameterthrough computing, the AI apparatus 21 may send the AI parameter to theanother storage device 2 through the first network interface 211, sothat the another storage device 2 receives the AI parameter. Similarly,if the another storage device 2 obtains an AI parameter throughcomputing, the another storage device 2 may send the AI parameter to thefirst network interface 211, and the first network interface 211 mayreceive the AI parameter.

The AI apparatus 21 is further configured to: when it is determined,based on the metadata, that the service data is located in the anotherstorage device 2, send a second data access request to the anotherstorage device 2, so that the service data is written into the AI memory210 through RDMA. The second data access request includes the metadata.

Optionally, the AI apparatus 21 further includes the AI computing powerunit 212. The AI computing power unit 212 is configured to provide AIcomputing power. In an example embodiment, an AI algorithm may be run onthe AI computing power unit 212, to perform model training or inferenceand prediction. The AI algorithm may be a neural network model.Essentially, the AI algorithm includes matrix or vector multiplicationand addition, and may further include a division operation and anexponential operation. The AI computing power unit 212 may include oneor more graphics processing units (graphics processing unit, GPU), aneural processing unit (neural-network processing unit, NPU), a tensorprocessing unit (tensor processing unit, TPU), a field-programmable gatearray (field-programmable gate array, FPGA), an application-specificintegrated circuit (application specific integrated circuit, ASIC), abrain-like chip, a reconfigurable general-purpose AI chip, a CPU, aprogrammable logic device (programmable logic device, PLD), acontroller, a state machine, gate logic, a discrete hardware component,or any combination of other circuits that can provide AI computingpower. The AI computing power unit 212 may include one or moreprocessing cores.

The AI computing power unit 212 is specifically configured to: obtainthe service data from the AI memory 210, and perform AI computing.

Optionally, the AI apparatus 21 includes the AI processor 213. The AIprocessor 213 may be connected to the AI memory 210 through thehigh-speed interconnect network 22. The AI processor 213 may be a CPU.The AI processor 213 may be configured to perform management andresource scheduling. The AI processor 213 may run an operating system oran application program. The AI processor 213 may include one or moreprocessing cores.

Optionally, the storage apparatus 20 includes the processor 202 and thememory 203. The processor 202 may be connected to the memory 203 throughthe high-speed interconnect network 22. The processor 202 may be acentral processing unit (central processing unit, CPU) in the storageapparatus 20. The processor 202 may be implemented in at least onehardware form of digital signal processor (digital signal processing,DSP), an FPGA, and a programmable logic array (programmable logic array,PLA).

In some possible embodiments, the storage device may be a centralizedcontrolled storage device. Specifically, the storage device may includeone or more controllers, and may control the hard disk through thecontroller. In addition, the storage device may further include one ormore cascading boards. The cascading board may be configured to cascadedifferent hard disks in the storage device. The cascading board may beconnected to the hard disk and the controller. For example, referring toFIG. 5 , the storage device may be of a dual-controller architecture.The dual-controller architecture means that there are two controllers inthe storage device. For example, in FIG. 5 , the system 1 may includetwo controllers: a controller 1 and a controller 2. For another example,referring to FIG. 6 , the storage device may be of a four-controllerarchitecture. The four-controller architecture means that there are fourcontrollers in the storage device. For example, in FIG. 6 , the system 1may include four controllers: a controller 1, a controller 2, acontroller 3, and a controller 4.

If the storage device is a centralized controlled storage device, an AIapparatus may be disposed inside the controller. In this case, thecontroller may further provide AI computing power in addition toimplementing an original function of controlling the hard disk. If thestorage device includes a plurality of controllers, an AI apparatus maybe disposed in each controller; or an AI apparatus may be disposed insome controllers, and no AI apparatus is disposed in the othercontrollers. This is not limited in this embodiment.

The memory 203 is configured to cache service data. Optionally, with afunction of supporting pass-through between the AI memory 210 and thememory 203, the memory 203 may also be configured to cache an AIparameter. The memory 203 is configured to cache the service data and/orthe AI parameter. When the processor 202, the AI computing power unit212, or the AI processor 213 needs to use the service data and/or the AIparameter, the processor 202, the AI computing power unit 212, or the AIprocessor 213 may directly invoke the service data and/or the AIparameter from the memory 203, to avoid repeated access, therebyreducing a waiting time of the processor 202, the AI computing powerunit 212, or the AI processor 213, and improving computing efficiency.For example, the memory 203 may be a high-speed cache. The memory 203may include a high-speed random access memory, or may include anonvolatile memory, for example, at least one magnetic disk storagedevice, a flash memory device, or a universal flash storage (universalflash storage, UFS).

The AI apparatus 21 includes the AI processor 213. The AI processor 213may be connected to the AI memory 210 through the high-speedinterconnect network 22. The AI processor 213 may be a CPU, and the AIprocessor 213 may be configured to perform management and resourcescheduling. In an example embodiment, the AI processor 213 may run anoperating system or an application program.

The AI processor 213 is disposed in the AI apparatus 21, so that an AIcomputing task and a non-AI computing task can be allocated to differentprocessors for execution. The AI computing power unit 212 executes theAI computing task, and the AI processor 213 executes the non-AIcomputing task, to implement computing task load sharing, therebyprevent computing power from being occupied when the AI computing powerunit 213 executes the non-AI computing task. In some embodiments, thestorage device may further use the AI processor 201 to share computingtasks of the processor 202. For example, the processor 202 may send atask of obtaining the service data from the hard disk to the AIprocessor 201, and the AI processor 201 may execute the task ofobtaining the service data from the hard disk, and send the obtainedservice data to the AI processor 201. In addition, performanceadvantages of different processors can be used. For example, due to anadvantage of a CPU in performing logic control, the CPU is used as theAI processor 213 to perform resource scheduling and management. Due toan advantage of a GPU or an NPU in performing a floating-point operationand parallel computing, the GPU or the NPU is used as the AI computingpower unit 212 to perform model training or other AI computing, toimprove overall AI computing efficiency and help AI acceleration. Inaddition, this can prevent an AI computing process from being interferedwith by a resource scheduling management process, improve overall AIcomputing efficiency, and help AI acceleration.

Optionally, the processor 202 is further configured to obtain a segmentof memory space from the memory 203 through division and reserve thesegment of memory space for the AI apparatus 21.

Optionally, the AI processor 213 is configured to: when an availablecapacity of the AI memory 210 reaches a preset threshold, send a memoryapplication request to the processor 202. The processor 202 isconfigured to obtain a segment of memory space from the memory 203through division based on the memory application request and reserve thesegment of memory space for the AI apparatus 21.

Optionally, the available capacity of the AI memory 210 is determined bya specified batchsize.

Optionally, the AI processor 213 is configured to: divide a computingtask into at least two subtasks, and send a first subtask in the atleast two subtasks to the processor 202. The processor 202 is furtherconfigured to: execute the first subtask, and send a computing result tothe AI processor 213. This optional manner may be applied to a pluralityof cases. For example, when it is determined that computing power of theAI processor 213 is insufficient, the AI processor may execute theoptional manner.

Optionally, the storage apparatus 20 may include the second networkinterface 204. The second network interface 204 is configured to performnetwork communication, the second network interface 204 may be a networkinterface card. The second network interface 204 may be connected to theAI apparatus 21, the processor 202, and the memory 203 through thehigh-speed interconnect network 22.

Optionally, the storage apparatus 20 may include a service interface205.

In some possible embodiments, the storage apparatus 20 may be configuredto: load the operating system to the AI apparatus 21, and start the AIapparatus 21 through the operating system. Specifically, the storageapparatus 20 may store an image file of the operating system. If thestorage apparatus 20 needs to start the AI apparatus 21, the storageapparatus 20 may send the image file of the operating system to the AIapparatus 21 through the high-speed interconnect network 22, and loadthe image file of the operating system to the AI memory 210 in the AIapparatus 21. The operating system runs based on the image file of theoperating system, to start the AI apparatus 21. The operating system maybe LINUX™, UNIX™, WINDOWS™, or the like.

The storage apparatus 20 loads the operating system to the AI apparatus21, and the operating system may not need to be installed in the AIapparatus 21, so as to prevent the operating system from occupyingstorage space of the AI apparatus 21, thereby ensuring a storagecapacity of the AI apparatus 21. Particularly, if the storage apparatus20 includes a plurality of AI apparatuses 21, operating systems of theplurality of AI apparatuses 21 are usually the same. Therefore, thestorage apparatus 20 may load the same operating system to the AIapparatuses 21 in batches, to start the AI apparatuses 21 in batches. Inthis way, a same operating system can be prevented from occupyingstorage space of all the AI apparatuses 21 because the same operatingsystem is installed on all the AI apparatuses 21, thereby saving storagespace of all the AI apparatuses 21, and improving storage efficiency. Inaddition, a batch startup manner can improve startup efficiency of theplurality of AI apparatuses 21.

Optionally, the AI apparatus 21 may further include a nonvolatilestorage medium. All or some of steps of the following method embodimentsmay be implemented in a form of a computer software product. Thecomputer software product may be stored in the nonvolatile storagemedium in the AI apparatus 21 or the AI memory 210 described above. Thecomputer software product includes one or more instructions for enablingthe AI apparatus 21 to perform all or some of the steps.

Optionally, the AI apparatus 21 and the storage apparatus 20 may beintegrated together and sold or used as a set of products. The AIapparatus 21 and the storage apparatus 20 are not separated from eachother. In some other possible embodiments, the AI apparatus 21 mayalternatively be sold as an independent product and used together withthe storage apparatus 20. For example, the AI apparatus 21 may beinserted into a high-speed interconnect bus of the storage device 2, orremoved from a high-speed interconnect bus of the storage device 2, toimplement contact with or separation from the storage apparatus 20.

It should be understood that the storage device 2 shown in FIG. 2 mayinclude more or fewer components. For example, there may be onecomponent, or there may be dozens or hundreds of components or morecomponents. For example, the AI apparatus 21 may include a plurality ofAI computing power units 212. Quantities of components in the storagedevice 2 are not limited in this embodiment.

It should be understood that the structure shown in this embodiment doesnot constitute a specific limitation on the storage device 2. In someother embodiments of this application, the storage device 2 may includemore or fewer components than those shown in FIG. 2 , or combine two ormore components in FIG. 2 , or split one component in FIG. 2 into two ormore nodes, or arrange two or more components in FIG. 2 at differentlocations.

The distributed storage system and the storage device provided in theembodiments of this application are described above, and methodprocedures applied to the distributed storage system and the storagedevice are described below by using an example.

FIG. 7 is a flowchart of a data processing method according to anembodiment of this application. The method may be applied to a storagedevice. The storage device may be any storage device shown in FIG. 1 orFIG. 2 . The method includes the following steps.

S71: A processor in the storage device stores service data in a harddisk.

For example, a client may send the service data to the storage device,and the storage device may receive the service data from the client andwrite the service data into the hard disk, to store the service data inthe hard disk.

S72: An AI apparatus in the storage device sends a data obtainingrequest to the processor in the storage device to obtain the servicedata, and performs AI computing on the service data.

Optionally, a computing resource of the processor and a computingresource of the AI apparatus in the storage device may constitute aresource pool. The storage device may schedule a resource in theresource pool to process AI computing. Overall AI computing power of thestorage device can be increased through collaboration of computing powerof the processor and computing power of the AI apparatus, therebyaccelerating AI computing.

The AI computing process may be but is not limited to one or more of thefollowing cases (1) and (2).

(1) Model Training

Specifically, the storage device may perform model training on a sampleset through the computing resource of the processor and the computingresource of the AI apparatus in the storage device, to obtain an AImodel. For example, the AI model may be a neural network model. Forexample, an image recognition model may be obtained through sample imagetraining, and the image recognition model may be but is not limited to aconvolutional neural network (convolutional neural networks, CNN), orcertainly may be another CNN-based neural network such as a region CNN(region-CNN, R-CNN), a fast R-CNN (fast R-CNN), or a faster R-CNN(faster R-CNN), or a single shot multibox detector (single shot multiboxdetector, SSD). For another example, the AI apparatus may obtain asemantic recognition model through sample text training, and thesemantic recognition model may be but is not limited to a recurrentneural network (recurrent neural networks, RNN). For another example,the AI apparatus may obtain a speech recognition model through samplespeech training.

(2) Inference and Prediction

Specifically, the storage device may input to-be-identified service datainto a trained AI model, and perform service data inference through theAI model, the computing resource of the processor, and the computingresource of the AI apparatus, to obtain an identification result of theservice data. For example, a to-be-recognized image may be input intothe image recognition model, and an image recognition result is output.The image recognition result may be but is not limited to one or more ofan image category, an image feature, or a location of an object in theimage. The image category may indicate a specific object included in theimage. For example, in a facial recognition scenario, the imagerecognition model is a facial recognition model, and the image categorymay be a face category. In a scene recognition scenario, the imagerecognition model is a scene recognition model, and the image categorymay be a scene category, for example, a ceiling, a lawn, or the ground.In a character recognition scenario, the image recognition model is acharacter recognition model, and the image category may be a charactercategory. The image feature may be but is not limited to aone-dimensional feature value, a two-dimensional feature map, athree-dimensional feature cube, or a higher-dimensional tensor. Thelocation of the object in the image may be represented by coordinates ofa bounding box (bounding box) in which the object is located. Foranother example, a to-be-recognized text may be input into the semanticrecognition model, and a semantic recognition result is output. Foranother example, a to-be-recognized speech may be input into the speechrecognition model, and a speech recognition result is output.

The processor and the AI apparatus are disposed in the storage deviceprovided in this embodiment, so that the storage device can provide anAI computing capability through the AI apparatus and provide a servicedata storage capability through the processor, thereby implementingconvergence of storage and AI computing power. An AI parameter and theservice data are transmitted inside the storage device through ahigh-speed interconnect network without a need of being forwardedthrough an external network. Therefore, a path for transmitting theservice data and the AI parameter is greatly shortened, and the servicedata can be loaded nearby, thereby accelerating loading. In addition,the AI apparatus can borrow a computing resource of the processor toprocess the service data, so that computing power of the AI apparatus isincreased, thereby accelerating AI computing.

In the embodiment in FIG. 7 , the AI apparatus may specifically obtainthe service data in a plurality of implementations. An embodiment inFIG. 8 , an embodiment in FIG. 9 , and an embodiment in FIG. 10 are usedas examples for description below.

In some embodiments, the AI apparatus can obtain the service data fromthe storage device nearby. An implementation of obtaining the servicedata nearby is described below by using the embodiment in FIG. 8 and theembodiment in FIG. 9 as examples.

For example, FIG. 8 is a flowchart of a data processing method accordingto an embodiment of this application. A procedure in which the AIapparatus obtains the service data nearby may include the followingsteps.

S81: An AI processor sends a first data obtaining request to theprocessor.

The AI processor may generate the first data obtaining request, and sendthe first data obtaining request to the processor through a high-speedinterconnect network. The first data obtaining request is used torequest the service data stored in the hard disk. The first dataobtaining request may carry an identifier of the service data. Theidentifier of the service data may be an ID of the service data.

S82: In response to the first data obtaining request, the processorobtains the service data from the hard disk, and sends the service datato the AI apparatus.

The processor may receive the first data obtaining request through thehigh-speed interconnect network. The processor may parse the first dataobtaining request to obtain the identifier of the service data that iscarried in the first data obtaining request. The processor may determinean address of the service data in the hard disk based on the identifierof the service data. The processor may access the hard disk based on theaddress of the service data, to obtain the service data stored in thehard disk. The processor may return the service data to the AI apparatusthrough the high-speed interconnect network.

S83: The AI apparatus receives the service data sent by the processor,and performs AI computing on the service data.

According to the method in which the AI apparatus obtains the servicedata nearby provided in this embodiment, the storage device includes theAI apparatus, the processor, and the hard disk. Therefore, when the AIapparatus needs to obtain the service data, the AI apparatus sends thedata obtaining request to the processor. The processor in the storagedevice obtains the service data from the hard disk, and sends theservice data to the AI processor, so that the AI processor can locallyobtain the service data. This avoids communication overheads caused byrequesting the service data from a remote storage device through anetwork, and shortens a delay of obtaining the service data.

The embodiment in FIG. 8 provides the procedure of obtaining the servicedata nearby. In some other embodiments of this application, the servicedata may alternatively be obtained nearby in another implementation. Theembodiment in FIG. 9 is used as an example for description below.

FIG. 9 is a schematic diagram of a data processing method according toan embodiment of this application. The procedure in which the AIapparatus obtains the service data nearby may include the followingsteps.

S91: The AI apparatus sends a second data obtaining request to theprocessor.

When the AI apparatus needs to obtain the service data from theprocessor, the AI processor may generate the second data obtainingrequest, and send the second data obtaining request to the processorthrough the high-speed interconnect network. The second data obtainingrequest is used to request to obtain the service data. The second dataobtaining request may carry an identifier of the service data. Theidentifier of the service data may be an ID of the service data.

S92: The processor sends metadata of the service data to the AIapparatus in response to the second data obtaining request, where themetadata is used to indicate an address of the service data.

The processor may parse the second data obtaining request to obtain theidentifier of the service data that is carried in the second dataobtaining request, obtain the metadata of the service data, and send themetadata of the service data to the AI processor through the high-speedinterconnect network. The metadata is used to indicate the address ofthe service data, that is, a storage location of the service data. Forexample, the metadata may indicate a start address of the service dataand a length of the service data, or the metadata may indicate a startaddress and an end address of the service data.

FIG. 10 is a schematic diagram of interaction between the AI processorand the processor in the storage device.

Optionally, the processor may determine, based on the address of theservice data, whether the service data can be sent to the AI apparatusthrough DMA. If the service data can be sent to the AI apparatus throughDMA, the metadata of the service data is sent to the AI apparatus, totrigger the method procedure provided in the embodiment in FIG. 9 . Inaddition, the processor may determine whether the service data can besent to the AI apparatus through RDMA. If the service data can be sentto the AI apparatus through RDMA, the metadata of the service data issent to the AI apparatus, to trigger a method procedure provided in anembodiment in FIG. 12 . In addition, if the service data cannot be sentto the AI apparatus through DMA or RDMA, the service data is loaded fromthe hard disk to a memory, and then the service data is transmitted fromthe memory to an AI memory.

S93: When the metadata indicates that the service data is located in thestorage device, the AI apparatus sends a first data access request tothe hard disk.

The first data access request includes the metadata of the service data,and the first data access request may be a DMA request. In an exampleembodiment, the AI processor in the AI apparatus may determine theaddress of the service data based on the metadata of the service data,and determine whether the service data is located in the hard disk oranother storage device. If determining that the service data is locatedin the hard disk, the AI processor generates the first data accessrequest based on the metadata, and sends the first data access requestto the hard disk through the high-speed interconnect network.

S94: The hard disk obtains the service data based on the metadata, andwrites the service data into the AI apparatus through DMA.

The hard disk may parse the first data access request to obtain themetadata carried in the first data access request, determine the addressof the service data based on the metadata, and access the address toobtain the service data. A DMA path may be established between the harddisk and the AI memory, and the hard disk may send the service data tothe AI memory through the DMA path. The hard disk may be a smart disk.The hard disk may include a CPU, and the hard disk may write the servicedata through the CPU in the hard disk.

For example, FIG. 11 is a schematic diagram of service data transmissionaccording to an embodiment of this application. It can be learned fromFIG. 11 that the service data stored in the hard disk can be transmittedto the AI memory through the high-speed interconnect network throughDMA.

According to the method provided in this embodiment, DMA pass-throughbetween the AI apparatus and the hard disk can be implemented. The DMApath is established between the AI apparatus and the hard disk, so thatthe AI apparatus and the hard disk can quickly exchange the service datawith each other through the DMA path. This accelerates service dataloading by the AI apparatus, increases an amount of service data thatcan be simultaneously processed by the AI apparatus, reducescommunication overheads for transmitting an AI parameter between AIapparatuses, and accelerates AI training.

In some other embodiments, the AI apparatus can obtain the service datafrom another storage device. A method for obtaining the service datafrom the another storage device is described below by using theembodiment in FIG. 12 as an example. Specifically, referring to FIG. 12, an embodiment of this application provides a data processing method.The method may be applied to a storage device. The storage device may beany storage device shown in FIG. 1 or FIG. 2 . The method includes thefollowing steps.

S1201: The AI apparatus sends a third data obtaining request to theprocessor.

S1202: The processor sends metadata of the service data to the AIapparatus in response to the third data obtaining request.

S1203: When the metadata indicates that the service data is located inanother storage device, the AI apparatus sends a second data accessrequest to the another storage device.

The AI processor may determine an address of the service data based onthe metadata of the service data, and determine, based on the address ofthe service data, whether the service data is locally stored or isstored in the another storage device. If determining that the servicedata is stored in the another storage device, the AI processor generatesthe second data access request, and sends the second data access requestto a first network interface. The first network interface may send thesecond data access request to the another storage device. The seconddata access request may include the metadata of the service data. Thesecond data access request may be an RDMA request. The second dataaccess request may include a destination address in the AI memory in thestorage device, to indicate the another storage device to write theservice data into the destination address.

S1204: The another storage device sends the service data to the AIapparatus in response to the second data access request.

For example, the another storage device may write the service data intothe AI memory through RDMA. Specifically, the another storage device mayparse the second data access request to obtain the metadata of theservice data that is carried in the second data access request,determine the address of the service data based on the metadata of theservice data, and access the address in a storage medium to obtain theservice data. In addition, the another storage device may establish anRDMA path between the another storage device and the AI memory in the AIapparatus based on the second data access request, and write the servicedata into the AI memory in the AI apparatus through the RDMA path.Specifically, the another storage device may generate an RDMA messagebased on the destination address and the service data in the AI memory,and send the RDMA message to the AI apparatus. The RDMA message includesthe service data and the destination address in the AI memory. The firstnetwork interface in the AI apparatus may receive the RDMA message, andparse the RDMA message to obtain the service data and the destinationaddress in the AI memory that are carried in the RDMA message. Theanother storage device accesses the AI memory through the high-speedinterconnect network, and writes the service data into the destinationaddress in the AI memory.

For example, FIG. 13 is a schematic diagram of service data transmissionaccording to an embodiment of this application. It can be learned fromFIG. 13 that the service data stored in the another storage device canbe transmitted to the AI memory through the high-speed interconnectnetwork through RDMA.

In this optional manner, RDMA pass-through between the AI memory in theAI apparatus and the another storage device is implemented, and the AImemory and the another storage device quickly exchange the service datawith each other. This accelerates AI training.

Optionally, the AI apparatus may provide AI computing power through anAI computing power unit. For example, referring to FIG. 14 , anembodiment of this application provides an AI computing processingmethod. The method may be applied to a storage device. The storagedevice may be any storage device shown in FIG. 1 or FIG. 2 . The methodis executed by an AI computing power unit in the storage device, andincludes the following steps.

S1401: The AI computing power unit obtains the service data from the AImemory.

The AI computing power unit may communicate with the AI memory throughthe high-speed interconnect network. The AI computing power unit mayaccess the AI memory through the high-speed interconnect network, toobtain the service data cached in the AI memory.

S1402: The AI computing power unit performs AI computing on the servicedata.

For details about AI computing, refer to step S72 in the embodiment inFIG. 7 . Details are not described herein again.

In this optional manner, the AI apparatus can provide the AI computingpower through the AI computing power unit, so as to prevent AI computingfrom occupying computing power of a storage apparatus, therebypreventing AI computing from severely affecting performance of thestorage device.

Optionally, the memory and the AI memory in the storage device mayimplement memory resource collaboration. The memory resourcecollaboration may be implemented in a plurality of manners. Anembodiment in FIG. 15 and an embodiment in FIG. 16 are used as examplesfor description below.

In some embodiments, the AI apparatus can borrow a memory in the storageapparatus to perform AI computing. Referring to FIG. 15 , an embodimentof this application provides a data processing method. The method may beapplied to the storage device. The storage device may be the firststorage device 10 or the second storage device 11 in the system 1 shownin FIG. 1 , or may be the storage device 2 shown in FIG. 2 . Interactionbodies of the method include the AI processor in the storage device andthe processor in the storage device. The method includes the followingsteps.

S1501: The AI processor determines that an available capacity of the AImemory reaches a preset threshold.

S1502: The AI processor sends a memory application request to theprocessor.

The memory application request is used to borrow a memory of theprocessor. The memory application request may carry a size ofto-be-borrowed memory space. It can be understood that the triggercondition specified in S1501 is merely an example. The AI processor mayalternatively send the memory application request to the processor inanother case or in a case in which there is no trigger condition.

S1503: The processor obtains a segment of memory space from the memorythrough division based on the memory application request and reservesthe segment of memory space for the AI processor.

S1504: The AI processor performs AI computing through the memory spaceof the memory in the storage device.

The AI processor may access, through a PCIe bus or a memory fabric, thememory space reserved by the processor in the storage device, andperform AI computing through the memory space of the memory and memoryspace of the AI memory. In an AI computing process, fast exchange can beperformed between the memory and the AI memory through DMA, PCIe, or thememory fabric. Therefore, because larger memory space can increase anamount of data that can be processed by a single AI processor in onebatch, overheads for parameter communication between different AIprocessors can be reduced, and AI training can be accelerated. Inaddition, when the AI processor stops performing AI computing, and doesnot use the memory space of the memory in the storage device, the AIprocessor may send a memory release request to the processor, so thatthe processor releases the reserved memory space and returns the memoryspace to the processor.

In a related technology, GPU memory is fixed, and consequently there isfrequently no sufficient memory for AI computing. In this optionalmanner, the AI apparatus can borrow the memory in the storage apparatusto perform AI computing, so that available memory space of the AIapparatus is expanded, and the AI apparatus can perform AI computing inlarger memory. This improves AI computing efficiency.

It should be noted that, under a trigger condition that the AI processorsends the memory application request, a manner in which the processor inthe storage device obtains the memory space through division andreserves the memory space for the AI processor is only an optionalmanner. In some other embodiments, the processor in the storage devicemay alternatively obtain a segment of memory space from the memorythrough division and reserve the segment of memory space for the AIapparatus in another case. For example, the processor in the storagedevice may actively obtain the memory space and reserve the memory spacefor the AI apparatus. For example, when determining that the availablecapacity of the memory is greater than the preset threshold, theprocessor in the storage device may obtain the memory space throughdivision and reserve the memory space for the AI apparatus. For anotherexample, the processor in the storage device may monitor an availablecapacity of the AI memory, and when determining that the availablecapacity of the AI memory reaches a preset threshold, the processor inthe storage device obtains memory space through division and reservesthe memory space for the AI apparatus. It should be understood that anapplication scenario in which the processor in the storage deviceobtains memory space through division and reserves the memory space forthe AI processor is not limited in this embodiment. In some embodiments,the processor in the storage device may obtain memory space throughdivision and reserve the memory space for the AI processor under anypreset condition.

Optionally, the available capacity of the AI memory is determined by aspecified batch size (batchsize). The batch size is an amount of dataused for one time of training. Specifically, the AI processor maydetermine the available capacity of the AI memory based on the specifiedbatch size, in other words, determine a specific capacity of the AImemory that is required for performing AI computing based on thespecified batch size. The AI processor may compare the availablecapacity of the AI memory with the preset threshold. If the availablecapacity of the AI memory reaches the preset threshold, it indicatesthat the AI memory has insufficient memory space. In this case, aprocedure of borrowing the memory in the storage device is triggered.

In this optional manner, the AI processor may perform training throughthe memory space of the memory in the storage device. Therefore, becauselarger available memory space can increase a batch size for AI training,an amount of service data that can be processed by the AI apparatus inone batch can be increased, communication overheads for exchanging an AIparameter between different AI apparatuses can be reduced, and AItraining can be accelerated. Experiments show that, if AI training isperformed only through the AI memory, a maximum batch size is set to256. However, in this manner, the batch size may be set to 32000.Therefore, the batch size is significantly increased.

In some embodiments, in a process in which the storage devicereads/writes an AI parameter, the AI memory may be preferentially used,and then the memory in the storage device is used, to accelerate AIparameter read/write by taking advantage that the AI memory usually hasbetter performance than the memory in the storage device (for example,an access speed of the AI memory may be faster than that of the memory).Specifically, the AI memory may serve as a first level and the memorymay serve as a second level, to perform layered AI parameter caching. Apriority of the first level is higher than a priority of the secondlevel. In this manner, the memory and AI memory in the storage deviceare layered, and an AI parameter is preferentially cached in the AImemory. In a possible implementation, a medium controller may establisha mapping relationship between the AI memory and the memory in thestorage device. If the AI memory overflows, an AI parameter cached inthe AI memory is stored in the memory in the storage device.

In some embodiments, the storage apparatus can borrow the memory in theAI apparatus to read/write the service data, to accelerate service dataaccess or service data storage. Referring to FIG. 16 , an embodiment ofthis application provides a data processing method. The method may beapplied to the storage device. The storage device may be any storagedevice shown in FIG. 1 or FIG. 2 . The method includes the followingsteps.

S1601: The processor in the storage device determines that an availablecapacity of the memory in the storage device reaches a preset threshold.

S1602: The processor in the storage device sends a memory applicationrequest to the AI processor.

The memory application request is used to borrow an AI memory of the AIprocessor. The memory application request may carry a size ofto-be-borrowed memory space. It can be understood that the triggercondition specified in S1111 is merely an example. The processor in thestorage device may alternatively send the memory application request tothe AI processor in another case or in a case in which there is notrigger condition.

S1603: The AI processor obtains a segment of memory space from the AImemory through division based on the memory application request andreserve the segment of memory space for the processor in the storagedevice.

S1604: The processor in the storage device performs AI computing throughthe memory space of the AI memory.

The processor in the storage device may access, through a high-speedinterconnect network or a memory fabric, the memory space reserved bythe AI processor, and read/write the service data through the memoryspace of the AI memory and memory space of the memory. In a service dataread/write process, fast exchange can be performed between the AI memoryand memory through DMA, the high-speed interconnect network, or thememory fabric. Therefore, because larger memory space can increase anamount of data that can be processed by a single processor in one batch,overheads for parameter communication between different processors canbe reduced, and service data read/write can be accelerated. In addition,when the processor in the storage device stops reading the service dataor writing the service data, and does not use the memory space of the AImemory, the processor may send a memory release request to the AIprocessor, so that the AI processor releases the reserved memory spaceand returns the memory space to the AI processor.

In an example application scenario, a user annotates a face image set,and needs to store the face image set in the storage device tosubsequently perform AI training through the face image set. In thiscase, the user may trigger the client, so that the client sends the faceimage set to the storage device. After the storage device receives theface image set, the processor in the storage device usually first cachesthe face image set into the memory, and then writes the face image setin the memory into the hard disk, to implement persistent storage of theface image set. Because a data amount of the face image set is huge, acapacity of the memory in the storage device may not meet a requirement,and consequently the available capacity of the memory in the storagedevice reaches the preset threshold. In this case, the processor in thestorage device may perform step S1601 to step S1604 to borrow memoryspace of the AI memory, and cache the face image set through largermemory space, thereby accelerating storage of the face image set.

In a related technology, the capacity of the memory in the storagedevice is fixed, and consequently there is frequently no sufficientmemory for storing the service data. In this optional manner, thestorage apparatus can borrow the AI memory in the AI apparatus toread/write the service data, so that the available memory space of thestorage apparatus is expanded, and the storage apparatus can store theservice data in larger memory. This reduces a service data read/writetime, and improves service data read/write efficiency.

Optionally, the processor and the AI processor in the storage device mayimplement computing power collaboration. The computing powercollaboration may be implemented in a plurality of manners. Anembodiment in FIG. 17 and an embodiment in FIG. 18 are used as examplesfor description below.

In some embodiments, the AI processor can borrow computing power of theprocessor in the storage device to process AI computing, to supportcollaboration between computing power of the processor in the storagedevice and computing power of the AI processor. Referring to FIG. 17 ,an embodiment of this application provides a data processing method. Themethod may be applied to the storage device. The storage device may beany storage device shown in FIG. 1 or FIG. 2 . The method includes thefollowing steps.

S1701: When it is determined that the computing power of the AIprocessor is insufficient, the AI processor divides a computing taskinto at least two subtasks.

In a process in which the AI processor performs AI training and resourcescheduling, because the AI processor performs a computing operation, aCPU is occupied, and consequently the computing power is insufficient.In this case, the AI processor may divide the computing task, to sharethe computing task through the processor.

S1702: The AI processor sends a first subtask in the at least twosubtasks to the processor in the storage device.

The first subtask is a task that is in the at least two subtasks andthat is to be executed by the processor in the storage device. The AIprocessor may send the first subtask through a high-speed interconnectnetwork.

S1703: The processor in the storage device executes the first subtask.

S1704: The processor in the storage device sends a computing result tothe AI processor.

After completing the first subtask, the processor in the storage devicemay feed back the computing result of the first subtask to the AIprocessor. The AI processor may receive the computing result, andexecute a next computing task based on the computing result, or schedulean AI computing power unit based on the computing result.

In this optional manner, when the computing power of the AI processor isinsufficient, the AI processor can borrow the computing power of theprocessor in the storage device to process AI computing, to implementcollaboration between the computing power of the AI processor and thecomputing power of the processor in the storage device, and increase thecomputing power of the AI processor, thereby accelerating AI training.

In some embodiments, the processor in the storage device can borrowcomputing power of the AI processor to read/write the service data, toaccelerate service data access or service data storage. Referring toFIG. 18 , an embodiment of this application provides a data processingmethod. The method may be applied to the storage device. The storagedevice may be any storage device shown in FIG. 1 or FIG. 2 . The methodincludes the following steps.

S1811: When it is determined that the computing power of the processorin the storage device is insufficient, the processor in the storagedevice divides a computing task into at least two subtasks.

In a process in which the processor in the storage device processes theservice data, because the processor performs a read operation or a writeoperation, a CPU is insufficient, and consequently the computing poweris insufficient. In this case, the processor in the storage device maydivide the computing task, to share the computing task through the AIprocessor.

S1812: The processor in the storage device sends a second subtask in theat least two subtasks to the AI processor.

The second subtask is a task that is in the at least two subtasks andthat is to be executed by the AI processor. For example, the secondsubtask may be obtaining the service data from the hard disk. Theprocessor may send the second subtask through a high-speed interconnectnetwork.

S1813: The AI processor executes the second subtask.

S1814: The AI processor sends a computing result to the processor in thestorage device.

After completing the second subtask, the AI processor may feed back thecomputing result of the second subtask to the processor in the storagedevice. The processor in the storage device may receive the computingresult, and execute a next computing task based on the computing result.

For example, if the second subtask is obtaining the service data fromthe hard disk, after obtaining the service data from the hard disk, theAI processor may send the obtained service data to the processor in thestorage device.

In this optional manner, when the computing power of the processor inthe storage device is insufficient, the processor in the storage devicecan borrow the computing power of the AI processor to read/write theservice data, to implement collaboration between the computing power ofthe AI processor and the computing power of the processor in the storagedevice, and increase the computing power of the processor in the storagedevice, thereby accelerating service data read/write.

In an optional embodiment, before the storage device processes AIcomputing, the processor in the storage device may load an operatingsystem to the AI apparatus, and the processor in the storage devicestarts the AI apparatus through the operating system. In anotheroptional embodiment, the AI apparatus may alternatively pre-store anoperating system, and the processor in the storage device may send astart instruction to the AI apparatus. The AI apparatus may receive thestart instruction, load the operating system in response to the startinstruction, and run the operating system, so that the AI apparatus isstarted. The processor may receive an AI computing instruction from aterminal or an upper-layer application. The AI computing instruction isused to indicate the processor to perform AI computing. The processormay load the operating system to the AI apparatus through trigger by theAI computing instruction. Certainly, the processor may alternativelyload the operating system to the AI apparatus on another occasion. Thisis not limited in this embodiment.

The foregoing embodiments describe a method procedure in which a singlestorage device processes AI computing. In some embodiments of thisapplication, a plurality of storage devices in the distributed storagesystem may collaborate to perform AI computing, thereby enhancingoverall AI computing power. Details are described below.

FIG. 19 is a flowchart of a data processing method according to anembodiment of this application. The method may be applied to the storagesystem shown in FIG. 1 , and the method may be applied to the system 1shown in FIG. 1 . Interaction bodies of the method include a firststorage device and a second storage device in the storage system. Themethod includes the following steps.

S1901: A first AI apparatus in the first storage device and a second AIapparatus in the second storage device transmit an AI parameter to eachother through a first network.

In a process in which the first storage device and the second storagedevice process AI computing, the first storage device and the secondstorage device may exchange the AI parameter with each other.Specifically, AI computing is usually performed based on a neuralnetwork model, and a computing process based on the neural network modelmainly includes two parts: a forward propagation algorithm and abackward propagation algorithm (back propagation neural networks, BP).The forward propagation algorithm is used to compute an output result ofthe neural network model. In a process of running the forwardpropagation algorithm, data is computed and transmitted layer by layerin a direction from an input layer of the neural network model to one ormore hidden layers to an output layer, until the data is output from theoutput layer. The backward propagation algorithm is used to reduce anerror between an output result of the model and an actual result. In aprocess of running the backward propagation algorithm, the neuralnetwork model is optimized by adjusting an AI parameter, for example, aweight of each neuron. In conclusion, the running of the neural networkmodel includes cyclic iteration of the forward propagation algorithm andthe backward propagation algorithm. In this process, the AI parameterneeds to be exchanged, so that the model can be continually optimizedthrough AI parameter adjustment.

When the first AI apparatus obtains the AI parameter through computing,the first AI apparatus may output the AI parameter through a firstnetwork interface. The first network may receive the AI parameter fromthe first network interface in the first AI apparatus, and the firstnetwork may send the received AI parameter to a first network interfacein the second AI apparatus. The second AI apparatus may receive theinput AI parameter from the first network interface.

Optionally, if the storage system further includes a host configured toprocess AI computing, the first network may be used to transmit the AIparameter between the first storage device, the second storage device,and the host.

S1902: A first storage apparatus in the first storage device and asecond storage apparatus in the second storage device transmit servicedata to each other through a second network.

For example, FIG. 20 is a schematic diagram of data transmissionaccording to an embodiment of this application. It can be learned fromFIG. 20 that an AI parameter may be transmitted between AI apparatusesin different storage devices through the first network, and service datamay be transmitted between storage apparatuses in different storagedevices through the second network.

According to the method provided in this embodiment, two storage devicesexchange an AI parameter with each other through respective AIapparatuses through the first network, and exchange service data witheach other through respective storage apparatuses through the secondnetwork, to collaboratively perform AI computing based on the AIparameter and the service data. Storage capabilities and AI computingpower of a plurality of storage devices are converged, so that anoverall storage capability and AI computing power of the system can beincreased.

Optionally, storage apparatuses may exchange service data with eachother through the first network. Specifically, referring to FIG. 21 , anembodiment of this application provides a service data transmissionmethod. The method may be applied to the system 1 shown in FIG. 1 .Interaction bodies of the method include a first storage apparatus and asecond storage apparatus. The method includes the following steps.

S2111: The first storage apparatus determines that a quantity of networkresources of the second network is less than a specified storage networkresource threshold.

The first storage apparatus or a management apparatus may detect thequantity of network resources of the second network of the first storageapparatus, determine whether the quantity of network resources of thesecond network of the first storage apparatus is less than the specifiedstorage network resource threshold, and if determining that the quantityof network resources of the second network is less than the specifiedstorage network resource threshold, perform step S2112. In someembodiments, the first storage apparatus or a management apparatus mayfurther detect a quantity of network resources of the first network ofthe first AI apparatus and a quantity of network resources of the firstnetwork of the second AI apparatus, and determine whether the quantityof network resources of the first network of the first AI apparatus andthe quantity of network resources of the first network of the second AIapparatus each are greater than a specified AI network resourcethreshold. If it is determined that the quantity of network resources ofthe second network is less than the specified storage network resourcethreshold, the quantity of network resources of the first network of thefirst AI apparatus is greater than the specified AI network resourcethreshold, and the quantity of network resources of the first network ofthe second AI apparatus is greater than the specified AI networkresource threshold, it indicates that network resources of the secondnetwork are currently insufficient but there are sufficient AI networkresources. In this case, step S2112 may be performed. In addition, itcan be understood that, even if there is no restrictive condition inS2111, the storage apparatuses may exchange the service data with eachother through the first network.

S2112: Other service data is transmitted between the first storageapparatus and the second storage apparatus through the first network.

For example, the first storage apparatus is a source device, and thesecond storage apparatus is a destination device. A procedure oftransmitting the other service data may include: The first networkinterface in the first AI apparatus may access a hard disk in the firststorage apparatus through a high-speed interconnect network, read theother service data from the hard disk in the first storage apparatus,and send the other service data to the second storage device through thefirst network. The first network interface in the second AI apparatusmay receive the other service data through the first network, and writethe other service data into a memory in the second storage apparatusthrough the high-speed interconnect network.

In an example scenario, when the distributed storage system starts ajob, and each storage device loads service data for the first time,congestion occurs in the second network because the storage device loadsthe service data through the second network. In this case, because theAI apparatus has not started the job or has just started the job, thefirst network is usually idle and therefore can be used.

The first network is used to transmit the service data, to accelerateservice data transmission, so that the service data can be quicklyloaded to each storage device through collaboration between the firstnetwork and the second network.

In a related technology, the service data is usually transmitted onlythrough the second network. This is equivalent to that there is only oneforwarding path of the second network. This optional manner can achieveat least the following effect: A new path is provided for the storageapparatuses to exchange the service data with each other. When networkresources of the second network are insufficient, the first network isused to exchange the service data, and the first network may be used asa newly added path for forwarding the service data. In this case, theservice data may be transmitted through the second network, or may betransmitted through the first network. This increases a networkbandwidth for transmitting the service data, shortens a delay ofexchanging the service data, accelerates service data exchange, andaccelerates AI computing.

Optionally, memory pass-through between different storage devices may beimplemented through RDMA. Specifically, referring to FIG. 22 , anembodiment of this application provides a data processing method. Themethod may be applied to the system 1 shown in FIG. 1 . Interactionbodies of the method include a first AI apparatus in a first storagedevice and a second AI apparatus in a second storage device. The methodincludes the following steps.

S2211: The first AI apparatus sends a network resource request of thefirst network to a second AI processor, and sends a memory RDMA accessrequest to the second processor.

The network resource request of the first network is used to request tooccupy some network resources of the first network of the second AIapparatus to transmit service data. The memory RDMA access request isused to access a second memory through RDMA. The memory RDMA accessrequest may include a destination address of the second memory, toindicate that the service data needs to be written into the destinationaddress.

In an optional embodiment, a first AI processor in the first AIapparatus may generate the network resource request of the first networkand the memory RDMA access request, and send the network resourcerequest of the first network and the memory RDMA access request to afirst network interface. The first network interface may send thenetwork resource request of the first network to the second AIprocessor, and send the memory RDMA access request to the secondprocessor. After receiving the network resource request of the firstnetwork, the second AI processor may obtain some network resources ofthe first network through division and reserve the some networkresources for the first AI apparatus. After the second AI processorreceives the memory RDMA access request, the second AI processor maygenerate a memory RDMA access response, and return the memory RDMAaccess response to the first AI apparatus. The memory RDMA accessresponse carries a target address of the second memory, so that thefirst AI apparatus writes the data into the target address of the secondmemory through RDMA. According to the foregoing procedure, an RDMA pathbetween the second AI apparatus and the first AI apparatus may beestablished.

S2212: The first AI apparatus reads the other service data from thefirst memory.

The first AI apparatus may include a first network interface. The firstnetwork interface supports an RDMA function. The first network interfacein the first AI apparatus may access the first memory through ahigh-speed interconnect network, and read the other service data fromthe first memory based on an address of the other service data in thefirst memory.

S2213: The first AI apparatus sends the other service data to the secondAI apparatus through the first network.

The first network interface in the first AI apparatus may send the readother service data, and the other service data may be transmitted to thesecond AI apparatus through the first network.

S2214: The second AI apparatus writes the other service data into thesecond memory.

Specifically, the second AI apparatus may include a first networkinterface. The first network interface supports an RDMA function. Thefirst network interface in the second AI apparatus may receive the otherservice data from the first AI apparatus. The first network interface inthe second AI apparatus may write the other service data into thedestination address of the second memory through the high-speedinterconnect network.

In addition, after the second AI apparatus completes a write operationon the other service data, the first AI apparatus and the second AIapparatus may release an occupied network resource of the first network,and a task of exchanging data between the first AI apparatus and thesecond AI apparatus is ended.

FIG. 23 is a schematic diagram of implementing data pass-through betweendifferent memories through a first network. It can be learned from FIG.23 that the first network can be used as a new path for forwardingservice data, and the service data cached in the first memory can bedirectly transmitted to the second memory through the first network.

In this optional manner, the service data can be exchanged between thefirst memory and the second memory through RDMA. Therefore, processingoverheads of the first processor and the second processor can beavoided, and the service data directly arrives at the second memory fromthe first memory, thereby accelerating service data exchange, andimproving service data exchange efficiency.

The embodiment in FIG. 23 shows a procedure of implementing RDMA datapass-through between memories in two storage devices through the firstnetwork. In some embodiments of this application, RDMA data pass-throughbetween a hard disk in a source storage device and a memory in adestination storage device may also be implemented in a similar manner.The following provides a specific description by using an embodiment inFIG. 24 .

Referring to FIG. 24 , an embodiment of this application provides a dataprocessing method. The method may be applied to the system 1 shown inFIG. 1 . Interaction bodies of the method include a first AI apparatusin a first storage device and a second AI apparatus in a second storagedevice. The method includes the following steps.

S2411: The first AI apparatus sends a network resource request of thefirst network to a second AI processor, and sends a memory RDMA accessrequest to the second processor.

S2412: The first AI apparatus reads the other service data from the harddisk in the first storage apparatus.

The first network interface in the first AI apparatus may access thehard disk in the first storage apparatus through a high-speedinterconnect network, and read the other service data from the hard diskin the first storage apparatus based on an address of the other servicedata in the hard disk in the first storage apparatus.

S2413: The first AI apparatus sends the other service data to the secondAI apparatus through the first network.

S2414: The second AI apparatus writes the other service data into thesecond memory.

FIG. 25 is a schematic diagram of implementing data pass-through betweena hard disk and a memory in a storage device through a first network. Itcan be learned from FIG. 25 that the first network can be used as a newpath for forwarding service data. The service data stored in the harddisk in the first storage apparatus can be directly transmitted to thesecond memory through the first network through RDMA, therebyimplementing data pass-through between a hard disk in a source storagenode and a memory in a target storage node.

In this optional manner, the service data can be exchanged between thehard disk in the first storage apparatus and the second memory throughRDMA. Therefore, processing overheads of the first processor and thesecond processor can be avoided, and the service data can directlyarrive at the second memory from the hard disk in the first storageapparatus, thereby accelerating service data exchange, and improvingservice data exchange efficiency.

Optionally, the AI apparatuses may exchange an AI parameter with eachother through a network resource of the second network. Specifically,another AI parameter is transmitted between the first AI apparatus andthe second AI apparatus through the second network.

This function may be triggered in a plurality of cases. For example,when a quantity of network resources of the first network is less than aspecified AI network resource threshold, the another AI parameter istransmitted between the first AI apparatus and the second AI apparatusthrough the second network. In addition, it can be understood that, evenif there is no restrictive condition that the quantity of networkresources is less than the specified AI network resource threshold, theanother AI parameter may be transmitted between the first AI apparatusand the second AI apparatus through the second network.

In an optional embodiment, the first AI apparatus, the second AIapparatus, or a management apparatus may detect the quantity of networkresources of the first network, determine whether the quantity ofnetwork resources of the first network is less than the specified AInetwork resource threshold, and if the quantity of network resources ofthe first network is less than the specified AI network resourcethreshold, transmit the another AI parameter through the second network.The first AI apparatus, the second AI apparatus, or the managementapparatus may further detect a quantity of network resources of thesecond network, determine whether the quantity of network resources ofthe second network is greater than a specified storage network resourcethreshold, and if the quantity of network resources of the first networkis less than the specified AI network resource threshold and thequantity of network resources of the second network is greater than thespecified storage network resource threshold, transmit the another AIparameter through the second network.

In an example application scenario, if all or most of the service datato be loaded by the first AI apparatus is located in the first storageapparatus, the first AI apparatus can locally load the service datanearby. This prevents a large quantity of operations of accessing aremote storage device through the second network, so that local networkresources of the second network are sufficient. However, because thefirst AI apparatus frequently exchanges the AI parameter with the remoteAI apparatus, network resources of the first network are insufficient.In this scenario, relatively sufficient network resources of the secondnetwork may be used as a new path of the AI parameter, therebyaccelerating AI parameter exchange.

In this optional manner, a new path is provided for the AI apparatusesto exchange the AI parameter with each other. When network resources ofthe first network are insufficient, the second network is used toexchange the AI parameter. This increases a network bandwidth fortransmitting the AI parameter, shortens a delay of exchanging the AIparameter, accelerates AI parameter exchange, and accelerates AIcomputing.

Optionally, AI memory pass-through between different storage devices maybe implemented through RDMA. Specifically, referring to FIG. 26 , anembodiment of this application provides a data processing method. Themethod may be applied to the system 1 shown in FIG. 1 . Interactionbodies of the method include a first storage device and a second storagedevice. The method includes the following steps.

S2601: The first AI processor sends a network resource request of thesecond network to the second processor.

The first AI processor may generate the network resource request of thesecond network, and send the network resource request of the secondnetwork to a first network interface. The first network interface maysend the network resource request of the second network to the secondprocessor. The network resource request of the second network is used torequest to occupy some network resources of the second network of asecond AI apparatus to transmit another AI parameter. The request of thesecond network may carry a destination address of the another AIparameter.

The network resource request of the second network may be a memory RDMAaccess request. The network resource request of the second network isused to request to access a second AI memory through RDMA. Thedestination address carried in the request of the second network may bean address in the second AI memory. After receiving the request of thesecond network, the second processor may establish an RDMA path betweena first AI memory and the second AI memory.

S2602: The first AI processor obtains the another AI parameter from thefirst AI memory.

S2603: The first AI processor transmits the another AI parameter to thesecond AI memory through the second network through RDMA.

The first AI processor may locally access the first AI memory to obtainthe another AI parameter, and send the another AI parameter to a firstnetwork interface in the first AI apparatus. The first network interfacein the first AI apparatus may send the another AI parameter to a firstnetwork interface in the second AI apparatus. The first networkinterface in the second AI apparatus may receive the another AIparameter, and write the another AI parameter into the second AI memory.In this process, the second AI processor in the second AI apparatus doesnot need to participate through RDMA, thereby preventing a processingresource of the second AI processor from being occupied, and improvingforwarding performance.

FIG. 27 is a schematic diagram of implementing data pass-through betweenAI memories in two storage devices through a second network. It can belearned from FIG. 27 that the second network can be used as a new pathfor forwarding an AI parameter, and the AI parameter of the first AImemory in the first storage device can be directly transmitted to thesecond AI memory through a first network through RDMA, therebyimplementing data pass-through between a AI memory in a source storagedevice and a AI memory in a target storage device.

In this optional manner, the AI parameter can be exchanged between thefirst AI memory and the second AI memory through RDMA. Therefore,processing overheads of the first AI processor and the second AIprocessor can be avoided, and the AI parameter can directly arrive atthe second AI memory from the first AI memory, thereby accelerating AIparameter exchange, and improving AI parameter exchange efficiency.

Optionally, if an AI apparatus performs training by borrowing a memoryin a storage device, the memory in the storage device may cache an AIparameter. Memory pass-through between different storage devices may beimplemented through RDMA, and AI parameters cached in memories indifferent storage devices may be exchanged through a second network.Specifically, referring to FIG. 28 , an embodiment of this applicationprovides a service data transmission method. Interaction bodies of themethod include a first storage device and a second storage device.

The method includes the following steps.

S2801: The first AI processor sends a network resource request of thesecond network to the second processor.

S2802: The first AI processor obtains the another AI parameter from thefirst memory.

The first AI processor may access the first memory through a high-speedinterconnect network, to obtain the another AI parameter cached in thefirst memory.

S2803: The first AI processor transmits the another AI parameter to thesecond memory through the second network through RDMA.

FIG. 29 is a schematic diagram of implementing AI parameter pass-throughbetween memories in two storage devices through a second network. It canbe learned from FIG. 29 that the second network can be used as a newpath for forwarding an AI parameter, and the AI parameter of the firstmemory in the first storage device can be directly transmitted to thesecond memory through a first network through RDMA, thereby implementingAI parameter pass-through between a memory in a source storage node anda memory in a target storage node.

In this optional manner, the AI parameter can be exchanged between thefirst memory and the second memory through RDMA. Therefore, processingoverheads of the first AI processor and the second AI processor can beavoided, and the AI parameter can directly arrive at the second memoryfrom the first memory, thereby accelerating AI parameter exchange, andimproving AI parameter exchange efficiency.

Optionally, a storage device in which a dataset is located in thestorage system may be scheduled by using the management apparatus toperform AI computing. For details, refer to the following methodembodiment.

FIG. 30 is a flowchart of a data processing method according to anembodiment of this application. The method may be applied to the system1 shown in FIG. 1 . Interaction bodies of the method include amanagement apparatus and a first storage device. The method includes thefollowing steps.

S3001: The management apparatus receives a first job request.

The first job request is used to request to perform training based on adataset. The dataset includes service data. The first job request mayinclude an identifier of the dataset. The identifier of the dataset isused to indicate the dataset, for example, may be an ID or a name of thedataset. In some embodiments, a client may generate the first jobrequest, and send the first job request to the management apparatus. Themanagement apparatus may receive the first job request from the client.The first job request may be triggered by an input operation of theuser. For example, the client may display a graphical user interface.The user may enter the identifier of the dataset in the graphical userinterface, and the client may receive the entered identifier of thedataset to generate the first job request.

S3002: The management apparatus determines distribution of ato-be-trained dataset based on the first job request.

The management apparatus may parse the first job request to obtain theidentifier of the dataset that is carried in the first job request, andquery a mapping relationship between the identifier of the dataset andmetadata of the dataset based on the identifier of the dataset, toobtain the metadata of the dataset. The metadata of the dataset is usedto indicate an address of the dataset. The management apparatus maydetermine the distribution of the dataset based on the metadata of thedataset. For example, the management apparatus may query a storagedevice in which each piece of service data in the dataset is located,and generate a storage device list. The storage device list includes adevice identifier. The device identifier is used to identify a targetstorage device. The target storage device stores some or all of theservice data in the dataset. Optionally, there may be a plurality ofdevice identifiers in the storage device list, and a sequence ofarranging the plurality of device identifiers is used to indicate anamount of data that is of the dataset and that is stored in acorresponding target storage device. For example, in the storage devicelist, a target device corresponding to the first device identifierstores the largest amount of service data in the dataset, and a targetdevice corresponding to the last device identifier in the storage devicelist stores the smallest amount of service data in the dataset.

S3003: When determining that the service data is distributed on thefirst storage device, the management apparatus sends a first computingrequest to the first storage device.

The first computing request is used to request the first AI apparatus toperform AI computing on the service data. The first computing requestmay include the identifier of the dataset, and the identifier of thedataset is carried to indicate that the service data in the datasetneeds to be obtained. For example, if the distributed storage systemincludes N storage devices: a storage device 1 to a storage device N,the dataset is distributed on the storage device 1, the storage device2, and the storage device 3, and the management apparatus may select toperform AI computing on the storage device 1, the storage device 2, andthe storage device 3, the management apparatus sends the first computingrequest to an AI apparatus of each of the storage device 1, the storagedevice 2, and the storage device 3, where N is a positive integer.

S3004: The first AI apparatus obtains the service data from the firststorage apparatus in the first storage device based on the firstcomputing request.

The first AI apparatus may parse the first computing request to obtainthe identifier of the dataset that is carried in the first computingrequest, determine an address of the dataset in the first storageapparatus based on the identifier of the dataset, and access the firststorage apparatus through a high-speed interconnect network, to obtainall or some of the service data of the dataset stored in the firststorage apparatus.

S3005: The first AI apparatus performs AI computing on the service datato obtain a first computing result.

In this optional manner, the management apparatus selects the firststorage device in which the service data is located, to provide AIcomputing, and the first storage device may obtain the service datathrough the first storage apparatus in the first storage device, toperform AI computing. This prevents the service data from moving acrossstorage devices, avoids a delay caused by accessing another storagedevice to obtain the service data, shortens a delay of obtaining theservice data, and accelerates AI computing.

Optionally, the management apparatus may determine whether a runningstatus of the first storage device meets a specified condition. If themanagement apparatus determines that the running status of the firststorage device meets the specified condition, the management apparatussends the first computing request to the first storage device.

The running status may include one or more of free space of a memory ina storage device, free space of an AI memory, an occupation status of anAI computing power unit, an occupation rate of an AI processor, anoccupation rate of a processor in the storage device, and networkresource usage. The preset condition includes any one of the followingconditions (1) to (5) and a combination thereof:

(1) whether the free space of the memory in the storage device isgreater than a specified space threshold;

(2) whether the free space of the AI memory is greater than a specifiedspace threshold;

(3) whether the occupation rate of the AI processor is less than aspecified occupation rate threshold;

(4) whether the occupation rate of the processor in the storage deviceis less than a specified occupation rate threshold; and

(5) whether the network resource usage is less than a specifiedoccupation rate threshold.

In this optional manner, it can be ensured that the selected firststorage device is not occupied currently and can provide AI computingpower, so as to avoid problems that device overheads are excessivelyhigh and an AI computing task cannot be completed in time because anoccupied storage device is selected to perform AI computing.

Optionally, if the storage device in which the dataset is located hasbeen occupied, the management apparatus may schedule a storage devicethat is in the storage system and that is close to the dataset toperform AI computing. Specifically, referring to FIG. 31 , an embodimentof this application provides a flowchart of a data processing method.The method may be applied to the system 1 shown in FIG. 1 . Interactionbodies of the method include a management apparatus, a first storagedevice, and a second storage device. The method includes the followingsteps.

S3101: The management apparatus receives a second job request.

Step S3101 is similar to step S3001. Details are not described hereinagain.

S3102: The management apparatus determines distribution of ato-be-trained dataset based on the second job request.

Step S3101 is similar to step S3002. Details are not described hereinagain.

S3103: When the other service data is distributed on the second storagedevice in the plurality of storage device, the management apparatusfurther determines whether a running status of the second storage devicemeets a specified condition.

S3104: When the running status of the second storage device does notmeet the specified condition, the management apparatus sends a secondcomputing request to the first storage device, where a distance betweenthe first storage device and the second storage device is less than aspecified distance threshold.

In some embodiments, for each storage device in the distributed storagesystem, the management apparatus may determine a running status of thestorage device and a distance between the storage device and the secondstorage device, and the management apparatus may determine, in thedistributed storage system based on the running status of each storagedevice and the distance between the storage device and the secondstorage device, a storage device whose running status meets thespecified condition and whose distance to the second storage device isless than the specified distance threshold, to obtain the first storagedevice.

In a possible implementation, the management apparatus may determine thefirst storage device through cost-based optimization (Cost-BasedOptimization, CBO for short). For example, for each storage device inthe distributed storage system, the management apparatus may compute acost value of the storage device based on the running status of thestorage device, the distance between the storage device and the secondstorage device, and an amount of data in a second dataset stored in thestorage device. The cost value indicates overheads of enabling thestorage device to perform AI computing. The management apparatus mayselect, based on the cost value of the storage device, a storage devicewhose cost value meets a specified condition to serve as the firststorage device. According to this algorithm, because a cost value of astorage device closest to the dataset is small, the storage deviceclosest to the dataset may be selected to start AI computing. Forexample, a weight x1 and a weight y1 may be respectively assigned to alocal storage and a remote storage, where x1 is less than y1. For anystorage device, if an amount of data that is of the second dataset andthat is stored in the storage device is x, and an amount of data that isof the second dataset and that is not stored in the storage device is y,a weighted summation is performed on x and y based on the weight x1 andthe weight y1, and an obtained weighted sum value is used as a costvalue of the storage device. The storage devices are sorted in ascendingorder of cost values. If n storage devices are required for AIcomputing, the first n storage devices are selected from the sortingresult as the first storage device. For example, a weight of locallyloaded data may be set to 1, and a weight of remotely loaded data may beset to 10. If a specific storage device stores 30% of the dataset, andthe remaining 70% of the dataset is not stored in the storage device, acost value of the storage device is 30%×1+70%×10=7.3.

Optionally, after selecting the first storage device, the managementapparatus may output a recommendation list to the client. Therecommendation list indicates that the management apparatus recommends astorage device that starts AI computing. The recommendation list mayinclude an identifier of the first storage device.

S3105: The first AI apparatus obtains the other service data from thesecond storage device based on the second computing request.

S3106: The first AI apparatus performs AI computing on the other servicedata to obtain a second computing result.

In this optional manner, if the storage device in which the dataset islocated has been occupied, the management apparatus can select a storageapparatus that is close to the dataset, to provide AI computing. Thisshortens a dataset transmission distance, and reduces cross-node servicedata movements.

The following describes an example of an application scenario of thisembodiment of this application.

With application of deep learning, a dataset including a large amount ofdata is usually required for model training, to obtain a neural networkthrough fitting. For example, FIG. 32 shows a logical procedure of modeltraining. The model training may include a model loading phase, a dataloading phase, a parameter initialization phase, a forward propagationphase, a loss computing phase, a backward propagation phase, a parameterupdate phase, and a weight saving phase. Because a large amount ofiterative training is required to obtain a model parameter, anditeration may be performed for hundreds to tens of thousands times, alarge amount of data needs to be loaded to perform parameter update andexchange, and consequently model training is very time-consuming. It canbe learned that fully using software resources and hardware resources,improving scheduling performance, and optimizing a data transmissionpath are very important for model training.

According to the methods provided in the embodiments of thisapplication, each phase of model training can be optimized.

Specifically, in the model loading phase, the methods provided in theembodiments in FIG. 30 and FIG. 31 may be used to provide near-data AItraining. Specifically, a near-data AI apparatus may be selected as atraining node based on an address of data required for model training,to start training.

In the data loading phase, the methods provided in the embodiment inFIG. 7 , the embodiment in FIG. 8 , the embodiment in FIG. 9 , theembodiment in FIG. 12 , and the embodiment in FIG. 14 may be used toshorten a service data transmission path, and perform a data read/writeoperation based on near storage. In addition, high-speed interconnectionbetween a memory and an AI memory can be used to implement RDMApass-through between the memory and the AI memory.

In the forward propagation phase, the method provided in the embodimentin FIG. 18 may be used to implement collaboration between computingpower of a processor and computing power of an AI processor based on thehigh-speed interconnect network, thereby accelerating AI operatorcomputing.

In the parameter update phase, the method provided in the embodiment inFIG. 26 and the method provided in the embodiment in FIG. 28 may be usedto implement an RDMA path between AI memories in different storagedevices and an RDMA path between memories in different storage devices,to transmit an AI parameter through the RDMA path, thereby acceleratingparameter exchange.

FIG. 33 is a logical architecture diagram of AI training according to anembodiment of this application. A user may perform an operation on aclient, and enter an AI training job. The client generates a request forindicating to perform the AI training job, to invoke a job to submit anapplication programming interface (Application Programming Interface,API), and sends the request to an AI job management service. Afterreceiving the request, the AI job management service parses the jobcarried in the request, and sends a resource request to the managementapparatus. After the management apparatus receives the resource request,a scheduler in the management apparatus selects an appropriate storagedevice from a managed bottom-layer physical resource pool, and starts acorresponding AI training job on an AI apparatus in the selected storagedevice. Computing power of the AI apparatus in the selected storagedevice is occupied by the AI training job. After the AI training job iscompleted, the compute resource is released.

All the foregoing optional technical solutions may be randomly combinedto form optional embodiments of this application. Details are notdescribed herein again.

All or some of the foregoing embodiments may be implemented by usingsoftware, hardware, firmware, or any combination thereof. When thesoftware is used to implement the embodiments, all or some of theembodiments may be implemented in a form of a computer program product.The computer program product includes one or more computer programinstructions. When the computer program instructions are loaded andexecuted on a computer, the procedures or functions according to theembodiments of this application are all or partially generated. Thecomputer may be a general-purpose computer, a special-purpose computer,a computer network, or another programmable apparatus. The computerinstruction may be stored in a computer-readable storage medium or maybe transmitted from a computer-readable storage medium to anothercomputer-readable storage medium. For example, the computer programinstructions may be transmitted from a website, computer, server, orservice data center to another website, computer, server, or servicedata center in a wired or wireless manner. The computer-readable storagemedium may be any usable medium accessible by a computer, or a servicedata storage device, such as a server or a service data center,integrating one or more usable media. The usable medium may be amagnetic medium (for example, a floppy disk, a hard disk, or a magnetictape), an optical medium (for example, a digital video disc (digitalvideo disc, DVD), a semiconductor medium (for example, a solid-statedrive), or the like.

A person of ordinary skill in the art may be aware that, method stepsand units described in the embodiments disclosed in this specificationmay be implemented by electronic hardware, computer software, or acombination thereof. To clearly describe interchangeability between thehardware and the software, the foregoing has generally described stepsand compositions of the embodiments according to functions. Whether thefunctions are performed by hardware or software depends on particularapplications and design constraints of the technical solutions. A personof ordinary skill in the art may use different methods to implement thedescribed functions for each particular application, but it should notbe considered that the implementation goes beyond the scope of thisapplication.

It can be clearly understood by a person skilled in the art that, forthe purpose of convenient and brief description, for a detailed workingprocess of the foregoing described system, apparatus, and unit, refer toa corresponding process in the foregoing method embodiment. Details arenot described herein again.

In the several embodiments provided in this application, it should beunderstood that the disclosed system, apparatus, and method may beimplemented in other manners. For example, the foregoing describedapparatus embodiment is merely an example. For example, the unitdivision is merely logical function division and may be other divisionduring actual implementation. For example, a plurality of units orcomponents may be combined or integrated into another system, or somefeatures may be ignored or not performed. In addition, the displayed ordiscussed mutual couplings, the direct couplings, or the communicationconnections may be implemented through some interfaces, and indirectcouplings or communication connections between the apparatuses or theunits may be connections in an electrical form, a mechanical form, oranother form.

The units described as separate parts may or may not be physicallyseparate, and parts displayed as units may or may not be physical units,may be located in one position, or may be distributed on a plurality ofnetwork units. Some or all of the units may be selected according toactual requirements to achieve the objectives of the solutions of theembodiments in this application.

In addition, function units in the embodiments of this application maybe integrated into one processing unit, or each of the units may existalone physically, or two or more units are integrated into one unit. Theintegrated unit may be implemented in a form of hardware, or may beimplemented in a form of a software function unit.

When the integrated unit is implemented in a form of a software functionunit and sold or used as an independent product, the integrated unit maybe stored in a computer-readable storage medium. Based on such anunderstanding, the technical solutions of this application essentially,or the part contributing to the prior art, or all or some of thetechnical solutions may be implemented in the form of a softwareproduct. The computer software product is stored in a storage medium andincludes several instructions for indicating a computer device (whichmay be a personal computer, a server, or a network device) to performall or some of the steps of the method described in the embodiments ofthis application. The foregoing storage medium includes any medium thatcan store program code, for example, a USB flash drive, a removable harddisk, a read-only memory (read-only memory, ROM), a random access memory(random access memory, RAM), a magnetic disk, or an optical disc.

The foregoing descriptions are merely specific implementations of thisapplication, but are not intended to limit the protection scope of thisapplication. Any modification or replacement readily figured out by aperson skilled in the art within the technical scope disclosed in thisapplication shall fall within the protection scope of this application.Therefore, the protection scope of this application shall be subject tothe protection scope of the claims.

All or some of the foregoing embodiments may be implemented by usingsoftware, hardware, firmware, or any combination thereof. When thesoftware is used to implement the embodiments, the embodiments may beimplemented all or partially in a form of a computer program product.The computer program product includes one or more computer programinstructions. When the computer program instructions are loaded andexecuted on a computer, the procedures or functions according to theembodiments of this application are all or partially generated. Thecomputer may be a general-purpose computer, a special-purpose computer,a computer network, or another programmable apparatus. The computerinstruction may be stored in a computer-readable storage medium or maybe transmitted from a computer-readable storage medium to anothercomputer-readable storage medium. For example, the computer programinstructions may be transmitted from a website, computer, server, ordata center to another website, computer, server, or data center in awired or wireless manner. The computer-readable storage medium may beany usable medium accessible by a computer, or a data storage device,such as a server or a data center, integrating one or more usable media.The usable medium may be a magnetic medium (for example, a floppy disk,a hard disk, or a magnetic tape), an optical medium (for example, adigital video disc (digital video disc, DVD), a semiconductor medium(for example, a solid-state drive), or the like.

A person of ordinary skill in the art may understand that all or some ofthe steps of the embodiments may be implemented by hardware or a programinstructing related hardware. The program may be stored in acomputer-readable storage medium. The mentioned storage medium may be aread-only memory, a magnetic disk, an optical disc, or the like.

The foregoing descriptions are merely optional embodiments of thisapplication, but are not intended to limit this application. Anymodification, equivalent replacement, or improvement made withoutdeparting from the spirit and principle of this application should fallwithin the protection scope of this application.

What is claimed is:
 1. A device, comprising: a central processing unit(CPU); a hard disk; and a graphics processing unit (GPU) coupled to theCPU and the hard disk, wherein the hard disk is configured to storefirst service data, and wherein the GPU is configured to: retrieve thefirst service data from the hard disk bypass the GPU, and performcomputing on the first service data.
 2. The device according to claim 1,wherein the GPU includes a memory, and wherein the GPU is furtherconfigured to write the first service data into the memory afterretrieving the first service data.
 3. The device according to claim 2,wherein the storage device includes a cache for temparily storing datato be processed by the CPU, wherein the cache stores second data, andwherein the GPU is further configured to retrieve the second data storedin the cache through the CPU, and store the second data in the memory ofthe GPU.
 4. The device according to claim 1, wherein the GPUcommunicates with the CPU through peripheral component interconnectexpress (PCIe).
 5. The device according to claim 1, wherein the GPUcommunicates with the hard disk through peripheral componentinterconnect express (PCIe).
 6. The device according to claim 1, whereinthe hard disk includes a solid state disk (SSD).
 7. A method foraccessing data implemented by a graphics processing unit (GPU) that iscoupled to a hard disk and a central processing unit (CPU), comprising:retrieving first service data from the hard disk bypass the GPU; andperforming computing on the first service data.
 8. The method accordingto claim 7, wherein the GPU includes a memory, the method furthercomprising: writing the first service data into the memory afterretrieving the first service data.
 9. The method according to claim 8,wherein the storage device includes a cache for temparily storing datato be processed by the CPU, wherein the cache stores second data, andthe method further comprising: retrieving the second data stored in thecache through the CPU, and storing the second data in the memory of theGPU.
 10. The method according to claim 7, further comprising:communicating with the CPU through PCIe.
 11. The method according toclaim 7, further comprising: communicating with the disk through PCIe.12. The method according to claim 7, the hard disk includes a SSD.
 13. Agraphics processing unit (GPU), comprising: a processor; and a memorycoupled to the processor, wherein the processor is configured to:retrieve service data from a solid state disk (SSD); write the servicedata into the memory; and perform computing on the service data storedin the memory.
 14. The GPU according to claim 13, wherein the processoris further configured to bypass a central processing unit (CPU) of aserver including the GPU during the GPU retrieving the service data.